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Chapter 1 
Introduction 



The purpose of this project is to compare the accuracy and consistency of 
human eyeball-based methods of morphologically classifying galaxies with 
that of automatic methods. In particular, galaxy types assigned by humans 
using the well known Hubble 'Tuning Fork' system are compared with cor- 
responding types output by artificial neural network (ANN) programs. 

What is the point of classifying galaxies? Of course, galaxy classifica- 
tion is not an end in itself, but a first step towards a greater understanding 
of the physics of galaxies. Many examples exist in science where a good 
classification system has led to a much improved understanding. A classic 
example is the periodic table of the elements: the ordering of elements by 
their observable properties showed patterns which led to the models of the 
structure of the atom, and to predictions of new elements which were sub- 
sequently discovered. Another example is the classification of lifeforms into 
species. If a classification system for an observed phenomenon is based on 
continuously varying parameters and those parameters are in turn shown to 
be important to the theories which attempt to explain the phenomenon then 
the classification system helps quantify the connection between the theory 
and the observations. 

The study of galaxies, their formation and evolution is still a science 
which is in its infancy, which is why people are still trying to perform basic 
classification. The youth of the science is illustrated by the fact that before 
the mid 1920s it was not even known for sure whether galaxies are separate 
systems outside our own Milky Way, or whether they were simply another 
type of nebula in our own galaxy. Although the idea had been suggested 
by Immanuel Kant in 1755, and the spiral nebulae clearly observed by Lord 
Rosse in the mid nineteenth century, it wasn't until such events as the 'Great 
Debate' in 1920, the discovery by Edwin Hubble of Cepheid variable stars 
in the Andromeda galaxy in 1923, giving a measure of its distance which 
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vastly exceeded previous estimates, and Hubble's discovery of the expansion 
of the universe in 1929 that it was accepted that galaxies are indeed separate 
'island universes', comparable in size to our galaxy. 

Once it was realised that galaxies are indeed separate systems, a logical 
step was to try to find patterns in their properties. Of course, the 'nebulae' 
had already been studied in detail since Lord Rosse's time, but nebulae in 
our galaxy were also included in the same studies. In the 1920s astronomy 
was confined to the optical waveband (radio astronomy did not begin until 
the 1930s and other wavebands were not exploited until even later), so the 
properties of galaxies that were studied were their visual appearance and their 
spectra. It was clear that galaxies came in different types, i.e. elliptical, spiral 
and irregular, and much further detail within their structure was visible in 
nearby examples, so it was inevitable that some sort of classification system 
would be set up. What is remarkable is that the system that was set up by 
Hubble in 1926, though only based on the appearances of these bright nearby 
galaxies, has proven to be so useful. No clearly superior system to describe 
observed galaxies has yet been developed. 

For the classification to be meaningful, the parameters must be physically 
motivated, or be shown to be correlated in a way which might be explainable 
by a physical model. A galaxy's appearance, spectrum etc. could simply be 
described, and patterns could be found, but the idea is to find a physical 
model which explains the patterns. Ultimately a complete theory should 
mean that any classification sysem is entirely objective. 

A good analogy for the ideal galaxy system is the Hertzsprung-Russell 
diagram used in stellar astronomy. This plots stellar absolute magnitude 
against colour (or equivalently luminosity against temperature) and shows 
well-known correlations such as the main sequence and the red giant branches. 
These have since been explained by detailed models of stellar evolution. The 
hope is that an analogous discovery process can occur for galaxies, to explain 
their formation and evolution into the forms seen today. Another motivation 
for classifying galaxies is to provide catalogues of objects for further study, 
and, at low redshift, a comprehensive base from which comparisons can be 
made with objects as higher redshift, where properties such as detailed mor- 
phology cannot be observed. 

The structure of this thesis is as follows: in Chapter 2 important classifica- 
tion systems are reviewed. They are important because several independent 
human experts are trained in and can therefore use these systems to classify 
galaxies. They provide the basis for comparisons between humans and auto- 
mated systems, the aim of the project. In particular the de Vaucouleurs 'T' 
system, based on the Hubble system, has been found to be useful. Chapter 3 
explains the increasing amount of data available to astronomers, the conse- 
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quent need for automated classifications and why ANNs are the best solution. 
ANNs are then described, concentrating on the concepts which are impor- 
tant in their application to this project. This is followed by the results of the 
comparisons in Chapter 4. No human classifications are undertaken in the 
project, but instead a number of neural networks are constructed and their 
outputs compared to various sets of eyeball classifications and human-ANN 
comparisons published in the literature. In particular, the eyeball classifica- 
tions of Shimasaku et al. (2001), who classify 456 galaxies from the Sloan 
Digital Sky Survey (SDSS) commissioning data, are used. The networks are 
also applied to the SDSS Early Data Release. Possible extensions to the 
work, including the important extension to classifying galaxies spectrally, 
are described in Chapter 5, followed by conclusions. A general listing for the 
network programs is included as Appendix B. 



Chapter 2 

Classification Systems 



2.1 The Hubble System 

By far the most well-known classification system, the Hubble (or Mount 
Wilson) system was first devised by Edwin Hubble (Hubble 1926), and has 
remained essentially the same to the present. Some subsequent modifications 
were made by him (Hubble 1936) and the system is given its definitive de- 
scription by Sandage in the Hubble Atlas of Galaxies (Sandage 1961). It is 
most recently fully described in The Carnegie Atlas of Galaxies (Sandage & 
Bedke 1994). The system divides nearby bright galaxies in the visual wave- 
band into ellipticals, lenticulars, spirals, barred spirals and irregulars to give 
the famous Tuning Fork diagram (figure 2.1). 




Figure 2.1: The Hubble Tuning Fork Diagram, including examples 
of each main galaxy type (from Gene Smith's Astronomy Tutorial, 
http : //casswww . ucsd . edu/public/tutorial/Galaxies . html) 
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In the original 1926 system the parameters are, for ellipticals, the elliptic- 
ity e, given by e = 10(a — b)/a where < e < 7 and a and b are the projected 
major and minor axes on the sky; for spirals both the central concentration 
of light and the tightness of the spiral arms define the sequence Sa-Sb-Sc. 
Type Sa galaxies have large nuclei and strongly wound smooth spiral arms. 
Type Sc have small nuclei and loosely wound arms which are generally more 
patchy, i.e. resolved into stars and HII (ionised hydrogen) regions. Sb's are 
intermediate. The presence or absence of a bar gives the other prong of the 
tuning fork, and the barred galaxies are divided in a similar way into Sba- 
Sbc. Irregular galaxies, i.e. those with no obvious structure, are given as a 
separate class. In 1936, SO and SBO (lenticular and barred lenticular) galaxies 
were added (Hubble 1936) and suggested as approximately a transition type 
between E7 and Sa/SBa. The nature of this transition is still controversial. 
The low resolution of Sc-Irr was given finer division by later workers and in 
1959 became the main axis of the de Vaucouleurs three dimensional system 
(§2.3), consisting of EO-SO-Sa-Sb-Sc-Sd-Sm-Im-Irr, with finer subdivisions 
S0"'°' + , Sab, etc. (de Vaucouleurs 1959). This syste m is also known as the 
Revised Hubble System. Type Sd is what would previously have been des- 
ignated late Sc (i.e. nearer to Irr than to Sb), and Sm and Im are further 
subdivisions reflecting the division of irregular galaxies into barred and un- 
barred forms (e.g. the Large Magellanic Cloud has a weak bar structure and 
is thus of type Im; Im stands for Irregular Magellanic). 

Although only based on the optical appearance of bright nearby galaxies, 
the Hubble Sequence has proven to correlate very well with many more gen- 
eral physical parameters of these galaxies. Combined with the fact that no 
clearly superior classification system has been found, this is why the Hubble 
system is so well-known and extensively used today. Examples of correlations 
include (from Lahav et al. 1995), integrated colour, dynamical properties 
such as stellar velocity dispersions and rotation curves, mass in free neutral 
hydrogen (HI) and, more broadly, galaxy overall mass and luminosity (see 
table 2.1). The mean colour is particularly significant because the integral 
of this for a galaxy reflects the mean spectral type of its stellar population. 
This connects with theories of galaxy formation and evolution, which predict 
galaxy properties such as their dynamics and star formation history. Further 
important parameters are the concentration parameter and the bulge to disk 
ratio, which measure the relative concentration of light towards the centre of 
the galaxy. Elliptical galaxies are diskless and the ratio changes through to 
Sc galaxies and beyond which appear bulgeless. The bulges are dominated 
by old red stars and the disks by young blue stars. Thus the bulge to disk 
ratio and concentration parameter correlate with Hubble type and overall 
galaxy colour. The concentration parameter is described in Chapter 4. 



Table 2.1: Galaxy characteristics correlate with classification (from Gene Smith's Astronomy Tutorial, 
http : //casswww.ucsd. edu/public/tutorial/Galaxies .html). 





E0-E7 


so 


Sa 


Sb 


Sc 


Irr 


Nuclear Bulge 


"All bulge", no disk 


Bulge & disk 


Large 




Small 


None 


Spiral Arms 


None 


None 


Tight/smooth 




Open/clumpy 


Occasional traces 


Gas (mass) 


Almost none 


Almost none 


"1% 


2-5% 


5-10% 


10-50% 


Young Stars HII Regions 


None 


None 


Traces 




Lots 


Dominates appearance 


Stars 


All Old(~10 10 yr) 


Old 


Some young 






Mostly young, but some old 


Spectral Type 


G-K 


G-K 


G-K 


F-K 


A-F 


A-F 


Color 


Red 


Red 








Blue 


Mass (M) 


10 8_ 1Q 13 




(More)10 12 -10 9 (Less) 




10 8 -10 n 


Luminosity (L) 


10 6 -10 n 




(More) 10 n -10 8 (Less) 




10 8 -10 n 
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However, the Hubble system is by no means perfect. It does not form 
a complete system for describing all galaxies, or indeed most in numerical 
terms, since most galaxies are small and faint, like the majority of stars. 
Some parameters are mixed up along the length of the sequence, for example 
bulge size, spiral arm smoothness and tightness all change from Sa to Sc as 
described above. The ellipticity e is a function of projection and true galaxy 
triaxiality. The system shows no correlation with disk galaxy morphological 
properties in the near-infrared (Block et al. 1999), where dust obscuration is 
around 10% of that in the optical. Block et al. describe the dust in the visual 
as a 'mask' of insignificant dynamical mass which obscures galaxy disks. The 
Hubble type therefore does not correlate with the dynamical mass distribu- 
tion in spiral galaxy disks in the infrared. The system does not explain the 
forms of galaxies at moderate and high redshift, i.e. z > 0.1, where galaxy 
evolution becomes significant. Further problems include the fact that galax- 
ies in rich clusters are not well described, being mostly SO or elliptical and 
thus the types are poorly resolved. All types of irregular galaxies are lumped 
together. Low surface brightness galaxies, dwarf spheroidals and spirals, and 
other unusual types such as amorphous galaxies (e.g. M82) or cD galaxies 
(see §2.2), are not described by the system (van den Bergh 1998). Active 
galaxies (AGN, quasars) are also not described, although these only form a 
small percentage of galaxies in the local universe. Those undergoing mergers 
are often recognisable as Hubble types, but are hard to quantify as such. 
Many of these problems are because the system was not designed to address 
them, but this does not alter its incompleteness. 

The Hubble system is important in this project because it forms the main 
axis of the de Vaucouleurs 3D system which in turn maps directly onto his 
numerical one-parameter 'T' system, described in §2.3.1. This is the system 
used for the comparisons between humans and artificial neural networks here 
and in the literature. Indeed, it is the only system which could practically be 
used for such a comparison, since it is the only one many independent experts 
are familiar with, and there is no obviously better system to learn and use. 
It is therefore worth making the comparisons in spite of the fact that it will 
not be the sought after final objective method of classifying galaxies. 

2.2 The Yerkes System 

This system, devised by W.W. Morgan (1958) is important because it uses 
the central concentration of light in a galaxy as its fundamental parameter. 
This has been shown to correlate with the mean stellar spectral type in 
galaxies, and in spiral galaxies reflects the ratio of the bulge (old red stars) 
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to the disk (light dominated by young blue stars). The radii at which certain 
percentages of the total flux are received from the galaxy are found, and their 
ratios taken (exact definitions of the concentration index can vary, and the 
ones used in this project are given in Chapter 4). Like the T parameter, the 
index is one dimensional, and is thus useful in the automated classification 
systems used today. For example, the ratio r 50 /r 90 (50%/90% of flux) is 
commonly used (again see Chapter 4), and is employed in this project and in 
the literature. The parameter is also useful because it can be measured for 
galaxies to a much greater distance (smaller image) than can any detailed 
morphology. 

In the Yerkes system, the galaxy types are designated a-af-f-fg-gk-k in 
order of increasing concentration (a=spiral type Sc or irregular, k=elliptical). 
Secondary 'form families' are added (ESBILND, where E=elliptical, S=spiral, 
B=barred, I=irregular, L=low surface brightness, N=bright or active nuclei, 
D=rotationally symmetric), and a number is added to give the flattening of 
the galaxy. A later form addition is a subclass of particularly luminous D 
galaxies called cD, giving the well known cD type of huge spherical galaxies 
found in the centres of rich galaxy clusters. cD's do not fit in the Hubble 
system. 

2.3 de Vaucouleurs' Extension to the Hubble 
System 

Described in de Vaucouleurs (1959), and illustrated in figure 2.2 this system 
is also known as the Revised Hubble system. It extends the tuning fork 
to three dimensions, and has orthogonal axes of the Hubble stage (E-SO- 
... -Sd-Im), the 'family' SA-SB (ordinary-barred), and 'variety' S(s)-S(r) 
(s-shaped to ringed). The Hubble stage axis maps onto the T system (de 
Vaucouleurs et al. 1991). This is reasonable because is has been shown that 
the family and variety axes have a much less significant correlation with phys- 
ical properties than the stage axis (van den Bergh 1998). The stage axis is 
not perfect, because at the spiral-irregular end luminosity and colour effects 
are mixed up and mapped onto the single parameter (van den Bergh 1998). 



2.3.1 The de Vaucouleurs T System 

This is the system used in the comparisons in this project. It has turned 
out to be a useful system because it is a one-to-one mapping between the 
letter-based Hubble stages, which are the one system that many different 
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ellipticals mricuiars spirals irregulars 




Figure 2.2: The de Vaucouleurs 3D system (de Vaucouleurs 1959). The size 
of the family and variety axes reflect the extent to which the features are 
present at that Hubble stage. The family and variety axes are not thought 
to be significant in terms of the physics of galaxy formation and evolution. 



CHAPTER 2. CLASSIFICATION SYSTEMS 



15 



experts have experience with, and the integer T types, which can be output 
by a neural network as a classification (either as integers or a continuous 
sequence). The system was used in the only extensive study carried out 
for comparison between humans and between humans and artificial neural 
networks (Nairn et al. 1995a,b, Lahav et al. 1995, 1996). As mentioned 
above, the parameter is not perfect, but for comparison with classifications 
by several humans it is the only practical solution. The Hubble and T systems 
are shown in table 2.2. 



Table 2.2: Hubble type versus T type in de Vaucouleurs' extension to the 
Hubble system. N.B.: cE = compact elliptical, cl = compact irregular, 10 = 
irregular: non-Magellanic/amorphous, Pec = peculiar/unidentified 



T Type 


-6 


-5 


-4 


-3 


-2 


-1 





1 


2 


3 


Hubble Type 


cE 


E0 


E+ 


so- 


SO 


S0+ 


SO/a 


Sa 


Sab 


Sb 


T Type 


4 


5 


6 


7 


8 


9 


10 


11 


90 


99 


Hubble Type 


Sbc 


Sc 


Scd 


Sd 


Sdm 


Sm 


Im 


cl 


10 


Pec 



2.4 Other Systems 

Many other systems have been devised. Some are purely descriptive, and 
therefore not claimed to be actual classification systems. They can be 
very detailed (examples are Wolf (1908), which includes nebulae and the 
Vorontsov-Velyamov et al. (1962) Morphological Catalogue of Galaxies). 
Others only cover certain types of galaxies, or simply extend the Hubble se- 
quence, for example Van den Bergh (1960) on spiral arm luminosity classes, 
although this covers such aspects as the correlation between arm type ('grand 
design' or flocculent) and luminosity, which the Hubble system does not cor- 
relate with. Still other systems have been superceded. Parameters such 
as the central concentration of light and, perhaps less usefully in hindsight, 
many very detailed aspects of optical spiral arm morphology, turn up in many 
of these systems. The spiral arm morphology details will probably be more 
useful when detailed galaxy formation and dynamics are better understood. 

Another important set of systems are those using purely spectral param- 
eters. As mentioned, the integrated spectrum of a galaxy reflects its stellar 
population and galaxies of similar morphological type tend to have similar 
stellar populations. Hence spectral and morphological types correlate quite 
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well. As with concentration index, spectral types can be measured to great 
distances. The first examples are Humason (1936) and Morgan & Mayall 
(1957), with many more since (e.g. Folkes et al. 1996, which also uses artifi- 
cial neural networks). 

More details of all the systems described in this chapter, and of further 
systems, are given in van den Bergh (1998) and the history by Sandage 
(1975). 

However, no matter how good the system, it is still of very limited use 
if it requires a human to manually classify each individual galaxy, because 
modern sky surveys produce data on orders of magnitude too many galaxies. 
Automatic systems are needed and this is described in the next chapter. 



Chapter 3 

Artificial Neural Networks 
(ANNs) 



3.1 The Need For Automation 

There are two reasons why automatic classification of galaxies is desirable: 

• The amount of data produced by modern sky surveys is simply too 
large for humans to classify manually 

• Humans are not objective and consistent classifiers 
3.1.1 The Increasing Amount of Data 

Increasing data, both galaxy images and masses of accompanying parame- 
ters, is becoming available in quantities vastly too large to classify manually. 
The increase is essentially technology driven, for example CCDs have the 
advantage of linear response to light intensity, and the data is directly stored 
digitally. They are also up to 90% efficient at collecting photons of light, 
as opposed to the few percent efficiency of photographic plates. These at- 
tributes, plus the exponentially increasing computer processing power avail- 
able, allow the larger datasets to be of more uniform properties and of higher 
quality than potentially variable photographic plates. 

Table 3.1 illustrates the increasing amount of data which has become 
available, with some landmark surveys. The number of galaxies surveyed 
and the number of redshifts available (which give the distance to a galaxy) 
are given. 
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Table 3.1: Some important optical surveys to illustrate the increasing amount of data available. Note: [1] Many of 
the nebulae are galaxies, others are e.g. star clusters, planetaries; the catalogue was extended from 103 to 110 later. 
[2] Again many of which are galaxies - note that the external nature of galaxies was not agreed upon until the late 
1920s. References: 1771-1967 - Seitter (1987), Fairall (1997); RC1 - de Vaucouleurs et al. 1964; Cfal - Davis et al. 
1982; APM - Maddox et al. 1990; LCRS - Shectman et al. 1996; 2dF - Colless et al. 2001 (astro-ph 0106498); SDSS 
- York et al. 2000; VISTA - http:/ /www. vista.ac.uk 



Date(s) 


Survey 


A.k.a. 


No. galaxy images 


No. rcdshifts 


1771 


Messier - Messier Catalogue 


M1-M103 


103 nebulae [1] 




1888 


Dreyer - New General Catalogue 


NGC 


4630+ objects [2] 




1914 


Slipher - taking galaxy spectra 






13 


1934 


Hubble (in the Realm of the Nebulae) 




c. 44,000 


100+ 


1956 


Humason, Mayall & Sandage 






800+ 


1964 


Reference Catalogue of Bright Galaxies 


RC1 




< 1,500 


1967 


Lick Observatory survey on the distribution of galaxies 




1,000,000+ 




1982 


Centre for Astrophysics Redshift Survey 


Cfal 




2,437 


1985-1990 


Automatic Plate Measuring machine Galaxy Survey 


APM 


c. 2,000,000 




1988-1994 


Las Campanas Redshift Survey 


LCRS 




26,418 


1996-2001 


Anglo Australian Observatory 2 degree field 


2dfGRS 


Based on APM 


c. 250,000 




Galaxy Redshift Survey 




(467,214) 




2000-2004 


Sloan Digital Sky Survey 


SDSS 


c. 50,000,000 


c. 1,000,000 


2004-2016+ 


Visible and Infrared Survey Telescope for Astronomy 


VISTA 


50Tb of images 





I 

Co 

1 

I 

tr-" 

S 

Co 



00 
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Classifying galaxies manually takes time, because a human expert must 
look at each individual image and decide which class the galaxy belongs to, 
or if it is uncertain. Examples of the largest manually classified catalogues 
are the Third Reference Catalogue of bright Galaxies (RC3, de Vaucouleurs 
et al. 1991), with nearly 18,000 galaxies, and the European Southern Ob- 
servatory photometric catalogue ESO-LV (Lauberts & Valentijn 1989) with 
more than 15,000 (numbers from Nairn et al. 1995a). These took several 
years to compile. The new surveys (see table) will have images of up to 50 
million galaxies, therefore an automatic classification system is vital. The 
system must be reliable, i.e. classify with known, quantified and acceptably 
small errors, and fast, so that the millions of galaxies can be processed on 
a reasonable timescale. Once the image parameters have been determined 
(also by automatic programs), automated algorithms can classify hundreds 
of galaxies per second, so millions of galaxies would take a few hours. The 
table also shows that even data from the 1930s has not been fully classified 
and parameterised as it could be, so much of the datasets already available 
remain largely unexplored. 

A significant recent dataset is the Sloan Digital Sky Survey Early Data 
Release (SDSS EDR, Stoughton et al. 2001). This includes over 120 parame- 
ters for each of 13,804,448 objects, around half of which are galaxies. In this 
project, simple networks were applied to a subset of this data (see Chapter 
4). 

3.1.2 Human Objectivity 

Automatic systems are also desired because they are more objective, i.e. not 
subject to the conscious and unconscious prejudices which affect humans 
in looking at galaxy images, no matter how well intentioned. For example 
someone who has always used the same system is biased towards looking 
for parameters which define the system, even though these parameters may 
be essentially arbitrary or totally empirical. Another example is the fact 
that the human brain has evolved to see patterns very easily, and sometimes 
see patterns which are not really there (the man who saw the tiger that 
wasn't there survived while the one who didn't see the one that was there 
did not). So spurious patterns in images could easily be picked up upon. 
Automatic systems, while by no means perfect, are at least more quantifiable, 
and although extraction of features from images is by no means a simple 
problem, it can be done reliably, and quantitatively. 
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3.1.3 The Solution 

Objective classifiers can be constructed from conventional statistics, such as 
principal component analysis (PCA), but these are limited to being linear, 
and thus have to make linear approximations. Artificial neural networks 
(ANNs) allow the implementation of nonlinear statistics, which better re- 
flect the underlying distribution of galaxy parameters. In combination with 
the increased consistency and size of the new digital sky surveys becoming 
available, as described above, much new information can be found. ANNs 
are now described. 

3.2 Introducing ANNs 

In this section artificial neural networks (ANNs) are introduced. The net- 
works used for classifying galaxies involve one or more layers of neurons, 
with the galaxy parameters as the input to the first layer and the classi- 
fication as the output of the final layer. The best networks from previ- 
ous studies used multiple layers. How ANNs work is described by begin- 
ning with a single neuron, generalising to a layer of neurons and then to 
many layers. The expressions used retain their form at each stage, but the 
scalars for a single neuron with a single input become vectors and matri- 
ces for the more complex arrangements. Additional terms such as weight- 
ing between layers also appear. Many other types of neural network be- 
sides the multilayer type exist, but these are not described in detail, be- 
cause they have not been used in human-ANN comparisons. This does not 
mean that they could not be used, however. Some examples are given in 
the following. See also Bishop (1995), or the Matlab documentation at 
http : //www . mathworks . com/access/helpdesk/help/ toolbox/nnet/nnet 
. shtml. 

Artificial Neural Networks (ANNs) are structures which mimick the neu- 
ral structure of the brain, in the sense that they consist of neurons acting as 
nodes which are able to carry out processing. The neurons are interlinked by 
one or more connections to other neurons. The connections can be weighted 
to alter the output from one neuron before it is input to the next. Whilst the 
networks may mimic the brain in this basic sense, they in no way approach 
the same level of complexity. The human brain has of order 10 11 neurons, 
each with a few to a few thousand connections, but the networks used for 
classifying galaxies only have up to hundreds of neurons, in a few layers with 
of the order of ten inputs and outputs, and the necessary number of weights. 
Thus it is more appropriate to think of them acting as a reasonably complex 



CHAPTER 3. ARTIFICIAL NEURAL NETWORKS (ANNS) 



21 



nonlinear mapping than a 'miniature brain'. 

The purpose of a classifier is to map a set of inputs onto the correct output 
(or outputs), the inputs being the parameters for the object in question and 
the output the classification. The classifier is trained on examples, or given 
a criterion for grouping data, and after this should be able to cope with 
new examples which it has not seen before. In the case of galaxies (and 
in most cases) the mapping is not simple but is nonlinear. ANNs are used 
because they are able to incorporate this nonlinearity, whereas conventional 
statistics are not. Linear classifiers do work, to some extent, but neural 
networks are better (e.g. Nairn et al. 1995b). Examples of linear classifiers 
are principal component analysis for extracting key parameters (see §3.7), or 
the Naive Bayes classifier, which uses basic Bayesian probabilities (Bazell & 
Aha 2001). 

As well as classifying galaxies, ANNs have been found useful for various 
other applications in astronomy. Examples are star-galaxy separation on 
images (determining whether an object is a star or a galaxy), stellar spectral 
classification, adaptive optics, scheduling observation time and time depen- 
dent applications such as predicting solar activity (reviews are Miller 1993 
and Storrie-Lombardi & Lahav 1994). 

3.3 Single Neuron 
3.3.1 One Input 

The simplest neuron has one input, one output, a weight, and some kind 
of transfer function which operates on the input to give the output. Also 
present is a bias, analogous to that in a DC electric circuit. The input is 
multiplied by the weight, the bias is added, and the transfer function operates 
on the total to give the output (see 'Multiple Inputs' below for more on 
the bias). It is the nonlinear transfer function which allows the network to 
incorporate nonlinearity. If the transfer function is linear then the network 
is also linear. This can be used to gauge the improvement in classification 
gained by allowing nonlinearity. Single neurons are shown in figure 3.1. 

Note: the equations in this section without the ij subscripts are modified 
from those given in the Matlab site. The figures in this section are also from 
the site, modified so that the equations are given separately. Figures 3.5 and 
3.6 are slightly modified. The URL for the site is http : //www .mathworks . com 
/ access/helpdesk/help/toolbox/nnet/nnet . shtml. 

For the single neurons 



CHAPTER 3. ARTIFICIAL NEURAL NETWORKS (ANNS) 



22 



Input Neuron without bias 
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Input Neuron with bias 
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Figure 3.1: Single neurons without and with bias. The £ represents the 
addition of wp and b. In the figures, n = wp and wp + b respectively. 



where 



a = f(wp) and a = f(wp + b) 



a = output 
/ = transfer function 

w = input weight 
p = input parameter 
b = bias 

Examples of transfer functions used in classifying galaxies are: 

• Hard limit: the output is if the input is below a certain value, and 
1 if the input equals or is above the value 

• Linear: the output is the same as the input, or a linearly scaled mul- 
tiple of the input 

• (Log) sigmoid: the output is between and 1, scaling an input in 
the range ±oo by -^-^ 

• tanh: similar to sigmoid, given by 1+e _ 2n _ 1 - It scales the output to 
between -1 and 1. 

In Matlab (see Chapter 4), these are called hardlim, purelin, logsig 
and tansig respectively. The functions are shown in figures 3.2-3.4. 

Clearly many other function are possible, although it has been shown 
that 'networks with biases, a sigmoid layer, and a linear output layer are ca- 
pable of approximating any function with a finite number of discontinuities' 
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Figure 3.2: The hard limit neuron transfer function. Again n = wp or wp+b. 
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Figure 3.3: The linear transfer function 
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Figure 3.4: The sigmoid transfer function 
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(Matlab site). This approximation can be made arbitrarily well. Sigmoids 
are also differentiable, which is required for the learning algorithm in multi- 
layer networks (see §3.6.2). 



3.3.2 Multiple Inputs 

A multiple input neuron is shown in figure 3.5. 



Input Neuron w Vector Input 
' \ f ^ 




Figure 3.5: A multiple input neuron (w terms modified from MATLAB site). 



Here 



a = /(Wp + 6) 

or 



R 

a = fl(£2 W iPi) + b ) 

i=l 

Multiple input neurons are similar to a single input neuron, the only 
difference being that there are R inputs. Each set of R inputs corresponds to 
one galaxy, and represents a vector in i?-dimensional space. There is a weight 
for each input, giving wi, w 2 , ■ ■ ■ wr, thus the weights also represent an R 
dimensional vector. The wp above becomes the dot product Wp (iwipi + 
W2P2 + ■■■WrPr). The bias, still a scalar, is added to this product as above. If 
the data points are plotted in R dimensional space, then the bias allows the 
line/plane etc. representing the boundary between two classes (the decision 
boundary, figure 3.6) not to be forced through the origin. For example, 
y = x has zero bias, y = x — 1 has a bias of -1. In two dimensions the 
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function is shifted left by b along the x axis. The vector represented by the 
R weights is orthogonal to the decision boundary. Thus the weights define 
the decision boundary, and hence which class each point belongs to. The 
number of points exactly on the decision boundary should be negligible in 
the case of classifying galaxies, and even if some were present, any error from 
this is small compared with the intrinsic spread in the parameters and the 
classifications. (Galaxies are complex!) A two dimensional decision boundary 
is shown in figure 3.6. 




b = -+1 



Figure 3.6: A decision boundary in two dimensions (w terms modified from 
Matlab site). 



3.3.3 Multiple Outputs 

A network may also have multiple outputs ( 0\ , 0,2 • • • dm 

for m outputs), 

and a classification is not restricted to being one output. If more than one 
output is used then it can be shown (e.g. Gish 1990) that for an ideal network 
(i.e. one which always classifies correctly) the figures generated for each class 
correspond to Bayesian a posteriori probabilities. An a posteriori probability 
for an output class is the probability that the output is of that class given 
that the input parameters are the values they are. Hopefully one class will 
have a much higher probability than the others, but sometimes two classes 
may have high values, indicating a galaxy of some intermediate class. For 
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example if the probabilities were, say, ~45% for Sa and ~45% for Sc then the 
true classification might be Sb (the other ~10% would be the low probabilities 
for the other classes). This can be useful for checking if classifications are 
uncertain. A diagnostic for multiple outputs is that the probabilities should 
add to one for each galaxy. Multiple outputs were not tried in this project. 
An example is Storrie-Lombardi et al. 1992 (see Chapter 4). 

3.4 A Layer of Neurons 

The single neuron is easily generalised to several neurons, each with several 
inputs and outputs, p is a scalar for one input, and an R dimensional vector 
for R inputs. W becomes an S by R matrix W, for the S neurons in the 
layer. The bias becomes an S dimensional vector. There are S outputs, so 
the output is also an S- dimensional vector. See figure 3.7. 

Input Layer of Neurons 




l 

V J 

Figure 3.7: A single layer of neurons (Matlab site). 
For the layer 

a = f (Wp + b) 

which can also be written as 
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S R 

dk = fk(y~] ^ WijPj + h) where k = 1. . .S 

i=l j=l 

3.5 Multiple Layers of Neurons 

The single layer can then be generalised to multiple layers. The outputs from 
the neurons in one layer become the inputs for those in the next. There can 
be a different number of neurons in each layer, and one or more inputs to the 
first layer and outputs from the last layer. There is a separate weight matrix 
between each layer, of dimensions R l by S l for R inputs from the previous 
layer and S neurons in the current layer, for layer /. The matrix between the 
inputs and the first layer is the input weight matrix IW, and those between 
layers are the layer weight matrices LW. Figure 3.8 shows the arrangement 
for three layers. 

Input Layer 1 Layer 2 Layer3 




Figure 3.8: Three layers of neurons. Other numbers of layers are similar. 
Each layer can have any number of neurons, including one. The superscripts 
indicate the layer number (1, 2 or 3 here). 



For the example of three layers 



for layer 1, and 



a 1 = f^lW^p + b 1 ) 
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a 2 = f 2 (LW 2 'V + b 2 ) a 3 = f 3 (LW 3 <V + b 3 ) 

for layers 2 and 3. 
Also 

a 3 = f 3 (LW 3 ' 2 f 2 (LW 2 ' 1 f 1 (IW 1 ' 1 p + b 1 ) + b 2 ) + b 3 ) 

where the superscripts refer to the layer of that number, i.e. 1, 2, or 3, and 
LW 3 ' 2 represents the layer weight matrix from layer 2 to layer 3, and so on. 

These are of similar form for any number of layers, and simplify to the 
equations earlier for the simpler networks. Multilayer networks are the ones 
most commonly used for classifying galaxies. The galaxy parameters form 
the input layer and the classifications the output layer. See Chapter 4 for 
examples. In this project the output layer is always a single neuron. 

3.6 Training the Networks 

For a network to classify galaxies correctly, it must perform the correct map- 
ping on the input parameters to produce the output classification. To obtain 
the correct mapping, or, if the parameters have intrinsic spread, the nearest 
approximation, the weights and bias must be adjusted to the optimal values. 
This is either done in an 'unsupervised' manner (see §3.7), or, as described 
here, by training the network. A set of inputs are given which correspond 
to a 'correct' set of classifications. This set is called the training set. The 
network reads the inputs and produces an output. The weights and bias are 
adjusted to try to bring them closer to the optimal values, using a chosen 
training algorithm. This process is iterative, and can be summarised by 
the following 

• Select an appropriate training set 

• Run network on the parameters to produce an output 

• Adjust weights and bias 

• Run network again 

• Continue until weights and bias are near to optimal values 
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3.6.1 A Training Set 

The training process causes the network to mimic the typical mappings made 
in the training set to produce the classifications from the parameters. The 
training set must therefore be a representative sample of the data, so that the 
network is able to correctly classify objects which were not used in training. 
Training takes much more processing time than the subsequent running of a 
network. 

In the case of galaxies, the classification is the galaxy type determined by a 
human expert and the parameters are various attributes of the galaxy image. 
An example would be if a galaxy has, say, a low central concentration of light, 
no discernable bar and quite loosely wound spiral arms then an expert using 
the broad categories of the Hubble system would call it Sc. Clearly the 
parameters are more quantitative than this, since the network deals with 
numerical values. A caution is that the training set should not contain too 
many idiosyncracies of the particular human classifer, otherwise the network 
will mimic them. The solution is to average types from more than one person 
in some quantifiable way. Typical training sets are a few hundred galaxies, 
each with of the order of ten parameters. Particular parameters, training 
sets and the averaging process are described in Chapter 4. 

The above assumes that the parameters have been successfully extracted 
from the galaxy images. Various software packages exist for doing this and in 
this project only ready-parameterised galaxies are used, both in training and 
in classification. This is not a serious restriction because sufficient galaxies 
are available for training. The galaxies in the SDSS Early Data Release 
are parameterised in this way using the SDSS's dedicated 'Photo' software 
(Lupton et al. 2001). 

3.6.2 Training 

The network is created with an initial set of weights and biases, which are, 
as appropriate, scalars, vectors or matrices of the correct dimension for the 
network arrangement. They are usually set to zero or to random values. After 
the first run with the training data the weights and biases must be adjusted 
to try and move them towards the optimal values for correct classification. 
This is done using a training algorithm. Typically these involve some kind 
of gradient descent, in which the point representing the weights vector is 
moved down the steepest gradient of the error contours towards the minimum 
in the error space. An approximate two dimensional analogy is a ball rolling 
down a sticky hill into a hollow: the ball stops when it reaches the lowest 
point. 
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There are many different training algorithms described in the literature, 
but the one commonly used in galaxy classification is backpropagation. 
This tries to minimise the cost function describing the difference between 
the classifications output from the network and the true values: 

k 

where E = error, o = network output, d = true output, for a set of k galax- 
ies. The minimisation is done by applying 

dE 

Swij(t + 1) = + aSwij(t) 

where t is the iteration number, rj is the learning rate, a is the 'momentum' 
of the movement of the error point from the previous iteration, and the -J^- 
is the gradient which is being descended. The second term is optional. 77 and 
a are constants (e.g. Lahav et al. 1996). The corrections are propagated 
back along the network, hence the term backpropagation. 

In this project a more sophisticated variant of the algorithm, known as 
resilient backpropagation, is used. This compensates to some extent for 
the flatness of the sigmoid transfer function for large or small input values, 
and in general is faster to converge than simple backpropagation. To the 
author's knowledge, the algorithm has not been specifically named and used 
in the galaxy classification literature. Here only simple backpropagation has 
been used, although other added sophistications such as regularisation (see 
§3.7) have been used in other studies (e.g. Nairn et al. (1995b), Lahav et al. 
(1996)). The algorithm is available in the Matlab Neural Network Toolbox, 
and is described on the website (PDF documentation chapter 5), and in 
Riedmiller & Barun (1993). Another algorithm, the Levenberg-Marquart 
(Matlab site PDF Chapter 5) was also tried, but was found to be less useful 
(see §4.4.1). Many further algorithms are available on the site, but these two 
were the ones most recommended by the documentation. 

However, no matter which training algorithm is used, it is highly unlikely 
that any set of weights it finds will be able to map every galaxy onto its 
correct classification. This is because of the nonlinear nature of the mapping 
and the intrinsic spread in the properties of galaxies. Therefore a criterion 
must be used to decide when to stop training the network, otherwise the 
iteration will just continue indefinitely, with the error tending towards an 
asymptotic minimum value. Possibilities are 
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• Train until the error drops below a certain value 

• Train until the gradient drops below a certain value 

• Train for a fixed number of iterations (or epochs) 

• Train for a fixed amount of processor time 

In general the minimum which the algorithm tends toward is not neces- 
sarily global. It could be a local minimum in the error space which is higher 
than the global minimum. The gradient descent backpropagation algorithms 
described here will only descend, so they could get stuck in a local minimum. 
This can be solved either by using a more sophisticated algorithm, which will 
be more complex and therefore slower, or by averaging the results of several 
runs of the network using different random initial weights and bias. It may 
also be the case that different sets of weights are equally good for classifying. 
The input parameters can also be normalised to be within a certain range, for 
example between and 1 (e.g. Nairn et al. 1995b). This allows the relative 
values of the weights to be meaningfully compared, thus giving a method of 
assessing the relative importance of the various input parameters. 

A further point to bear in mind is that the parameters i] and a are 
arbitrary, and so good values must be found essentially by trial and error. If 
i] is too low the network will not find the optimal weights very quickly and 
if it is too high the minimum may be overshot, resulting in weights which 
diverge exponentially from the optimal values. 

3.7 Further Points 

The networks described above are 

• Backpropagation: the backpropagation (gradient descent) algorithm, 
and the more sophisticated variants described, are used for training 

• Multilayer: the neurons are arranged in layers, with input, hidden 
and output layers 

• Perceptrons: the neurons each have a transfer function, either hard 
limit, linear, sigmoid or tanh 

• Supervised: the networks are trained with an example set of galaxies 
classified by humans 
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• Feedforward: the training has no effect on the input parameters, just 
the weights and biases 

• Static: there is no time dependence in the network 

They are thus known as feedforward multilayer perceptrons. (In fact 
a perceptron neuron originally meant that the neuron had the hard limit 
transfer function, but a multilayer 'perceptron' can have any set of trans- 
fer functions). Clearly many further possibilities exist. Some are now de- 
scribed. More details of principal component analysis, backpropagation, 
quasi-Newton and the Bayesian perspective are, for example, in the appen- 
dices to Lahav et al. 1996. Many of the possibilities are also on the Matlab 
site or in Bishop (1995). 

• Other networks: Many other types of neural network exist, for ex- 
ample radial basis, competitive learning, self-organising maps, learning 
vector quantisation networks, or time dependent examples such as the 
Elman and Hopfield networks. 

• Transfer functions: Besides hard limit, linear, sigmoid and tanh, 
others include competitive transfer, radial basis, triangular basis, or 
simply variations such as hard limit between -1 and 1 instead of and 
1, negative linear etc. The possibilities are virtually endless. 

• Training algorithms: Besides the gradient descent and resilient forms 
of backpropagation, and the Levenberg-Marquardt algorithm, many 
more routines exist. Examples are the Quasi-Newton algorithms, ver- 
sions of Newton- Raphson iteration for finding the roots of a quadratic 
equation, conjugate gradient descent, where the steepest gradient is not 
necessarily followed, and variable learning rate, which can escape from 
local minima. None that have been tried have significantly improved 
the classifications from the networks, although the unsupervised al- 
gorithm is important. 

• Unsupervised networks: Instead of being trained using subjectively 
classified data, these types of network look on their own for clustering 
patterns within the data. They can thus perform classification in a 
completely objective manner, and may find patterns that the humans 
have missed. One sort acts as a nonlinear generalisation of the well- 
known statistical tool of principle component analysis (PCA), in which 
the linear combination of input parameters with maximum variance is 
found. Another is the Kohonen Self Organising Map (KSOM). Both 
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PCA and KSOMs have been applied to classifying galaxies at moderate 
redshifts to try and extend the Hubble sequence in a quantitative way 
into lookback times where galaxy evolution has become significant (e.g. 
Abraham et al. 1994, Nairn et al. 1997). It has been found that 
the Hubble sequence breaks down at moderate to high redshift. PCA 
is important in classifying galaxies spectrally, where the parameters 
correspond to those describing galaxy evolution, such as star formation 
rate. 

• Regularisation: This adds to the cost function to stop the learning 
rate becoming too high. It is used by Nairn et al. (1995b) and others. 
It is most useful when only small datasets are avilable for training. 

• Methods of initialisation: Besides intialising the weights and bias to 
zero, or randomly, algorithms exist to try and optimise the initialisation 
in some way. An example is the Nguyen- Widrow algorithm. 

• Feedback, Time Dependence and Incremental Training: These 
are of less use for galaxies, but have found use in other areas of astron- 
omy, such as predicting solar activity, as mentioned above. In incre- 
mental training the weights and biases would be adjusted by training 
for several epochs on each individual galaxy in turn as it was pre- 
sented. Thus the network would only remember the galaxy it had just 
been trained on. The method used above, i.e. adjust after being pre- 
sented with all the galaxies, is known as batch training. This is clearly 
preferable. 

• Multiple Networks: The output from one neural network can form 
the input for another, or several networks can be used in parallel in ways 
which are not equivalent to simply having a larger network. Examples 
are the 'waterfall' arrangement of Adams & Woolley (1994) and the 
ensembles of Bazell & Aha (2001). The latter have been found of give 
a slight improvement over single networks. The individual networks are 
given random training sets, and, whilst each one on its own performs 
less well, the voting system between their outputs results in the correct 
output being chosen more often. A network is generally more likely to 
choose the correct output than any particular incorrect one. 

• Bayesian Perspective: A final point is that the way neural networks 
have been described in this chapter, and the expressions given, are not 
the only way they can be described. Equivalent descriptions can be 
given in terms of probability theory, i.e. a Bayesian perspective. The 
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one example included is that of using multiple outputs instead of one, 
in which the outputs correspond to a posteriori probabilities. 

For classifying galaxies, many of these possibilities have either not been 
tried or have not been found to be significantly better than those described 
above. Results from the literature and those from this project are described 
in Chapter 4. 



Chapter 4 



Results: Humans and Linear 
Classifiers versus ANNs 

The project aims to try out various artificial neural network programs and 
see how they compare to each other, to linear classifiers, and to humans 
in correctly classifying galaxies. This section presents and compares the 
programs used, the results obtained by previous studies in the literature, 
and the results obtained by this project. 

4.1 Programs 

Trying every neural network program from the hundreds available clearly was 
not feasible since it takes too long to learn to use each one, so emphasis was 
placed on trying less programs but in more depth. There is no widely used 
ANN program for classifying galaxies, so an early task of the project was to 
see what was available on the web. Another constraint is that it would take 
too long to learn and run image analysis software, so ready-parameterised 
images were used. In fact, it turned out that only one program was required, 
because it allowed the generic creation of almost any neural network. This 
program is the Matlab Neural Network Toolbox, described below. 

4.1.1 LMorpho 

LMorpho, or Linux Morpho, was the first program tried. It is a collection 
of routines for working with and classifying galaxy images written by S.C. 
Odewahn at Arizona State University, USA, and includes a neural network 
routine. It is available for download at http : / /www . public . asu . edu/~asu- 
sco/documents/lmorpho/dist/index.html. Unfortunately, the program 
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was described as 'not particularly user friendly' by the author, and some 
time was spent on installation, configuring the compiler, and getting the 
program to work. It then turned out that full image processing would be 
required to extract parameters in a suitable form to run through the net- 
work. This, and the availability of the Matlab program described below 
meant that this program was abandoned for this project. The version tried 
was lmorpho_jul24_2000, run on Red Hat Linux 6.2. 

4.1.2 Matlab Neural Network Toolbox 

This set of tools within the well known Matlab program enabled the genera- 
tion from scratch of many types of neural network, including those useful for 
galaxy classification. Extensive documentation is available on line, including 
much introductory material. Results are detailed in §4.4. The version used 
was 5.3.1.29215a (Rll.) from Oct 6th 1999. The URL for the documentation 
is http : //www . mathworks . com/access/helpdesk/help/ toolbox/nnet/ 
nnet . shtml. 

4.1.3 Other Programs 

The range of options available in and the generic-ness of the networks in 
Matlab meant that no other programs needed to be tried out. Others do 
exist (see §5.1). 

4.2 Galaxy Parameters 

The number of parameters used to describe a galaxy can vary from one to 
oo. Some kind of compromise must therefore be found which retains most of 
the physical information. Many different parameters are used, but usually of 
order one to ten per galaxy. 

If an inappropriate set is used information is lost and the networks cannot 
classify properly. If too many are used the networks are slow, and in training 
are more likely to get stuck in local error minima and mimic noise in the 
data, both of which render them unable to classify properly new galaxies 
which they have not previously seen. 

In this project the parameters are standard outputs from the SDSS pho- 
tometry software Photo v5_0_3 as used in Shimasaku et al. (2001) (given 
by them as 'late 1999', see also Lupton et al. (2001)). These include the 
r 5o/ r 90 (inverse) concentration index, likelihood of de Vaucouleurs profile 
PoeVi likelihood of exponential profile Pe xp , and colour in the five SDSS 
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bands u* , g* ,r* ,i* , z* . These parameters are now described. 

The concentration index defined in Shimasaku et al. (2001) is based on 
the Petrosian radius rp. This is independent of galaxy distance, because it 
is based on galaxy surface brightness, and is given by the implicit expression 



where r = linear radius, / = intensity, 77 = Petrosian Ratio (see below). 

This is for an annulus of infinitesimal width and finite diameter within the 
object. For a value of rp in practice, values are adopted such that the local 
surface brightness in an annulus of finite width from arp to f3rp is r] percent 
of the mean surface brightness within rp. The mean surface brightness is the 
Petrosian flux f P 



Usually r] = 0.2, a = 0.8, (3 = 1.25. These values are used in the literature, 
including Shimasaku et al., and are hence used here. 

The concentration index is then given by the ratio of the radii r a /rb where 
the radii are those at which a and b percent of the total Petrosian flux are 
received. Usually a = 50 and b = 90, giving r 50 /r 90 . 



7/ 





per unit area nr 2 , i.e. 




thus 




(Petrosian equations adapted from Shimasaku et al. (2001) and Strateva et 
al. (2001).) 
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The de Vaucouleurs profile (de Vaucouleurs 1948) describes the radial 
distribution of intensity in elliptical galaxies, and is given by: 

I(r) = I eM-7-Q7[(r/r e )^ - 1]} 

where J(r)=intensity at radius r, Iq = central intensity, r e = a the half-light 
radius for the galaxy. 

The exponential profile (e.g. Freeman 1970) is similar, but for spiral 
galaxy disks: 

I(r) = Jo ex p(— 1.68r/r e ) 

So PoeV and Pe xp correlate with whether a galaxy is spiral or elliptical. 
The probabilities are standard x 2 fits to the respective profiles. 

Colours are given as differences between magnitudes in defined wave- 
length bands for a galaxy. Shimasaku et al. (2001) use the preliminary 
Sloan Digital Sky Survey (SDSS) bands u*, g*, r*, i* and z*. These are 
shown in table 4.1. The values used in the project are not corrected for 
galactic extinction, and the mean reddenings in the SDSS Early Data Re- 
lease (EDR) for each band are 0.21, 0.15, 0.11, 0.08, and 0.06 magnitudes. 
However, this is small compared with the intrinsic dispersions in the colours 
and in the context of using neural networks. Also, the preliminary SDSS 
magnitudes are so-called because the calibration is preliminary. The final 
magnitudes may differ by as much as 0.1. 

Table 4.1: The SDSS Photometric System (Stoughton et al. (2001), table 
20). FWHM = full- width half-maximum, a measure of the bandwidth. 



Band 


Wavelength/ A 


FWHM/ A 


u* 


3551 


581 


9* 


4686 


1262 




6166 


1149 


i* 


7480 


1237 


z* 


8932 


994 



The independent colours are therefore u* —g*, g* — r*, r* —i* and i* — z*. 
Many more parameters have been used in the literature, for example 
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• Isophotes: various values of Petrosian radii could be measured, e.g. 
r w, r 2o ■ ■ ■ r 90; or 7 in the concentration parameter expression above 
could be varied. 

• Absolute magnitudes: or Petrosian fluxes could be given in a similar 
way to the radii. 

• Spiral arm pitch angle: The angle between the spiral arm and the 
tangential direction to the galaxy centre at the point where the arm 
starts can be quantified. 

• Harmonics: Block et al. (1999) use harmonics in spiral galaxy disks, 
viewed in the infrared so that the dust extinction is low, as a basis for 
classifying spiral disks. 

• Bar strength: An example is Abraham & Merrifield (2000), who 
define the bar strength as ^[arctan(b/a)^J 2 — arctan(a/b)^/ 2 }. This 
gives zero for no bar and one for an infinitely strong bar. 

• Measures of asymmetry: The image of a galaxy is projected on the 
sky, so measurements such as the ratio of the longest diameter to the 
shortest for an image can quantify how face-on or edge-on the image 
is. Edge on disk galaxies are difficult or impossible to classify. 

• Octants: The profile fits, etc. can be measured for octants or quar- 
ters, etc. of the galaxy image. Storrie-Lombardi et al. (1992) fit a 
generalised de Vaucouleurs profiles to various numbers of octants (the 
'oct' parameters in figure 4.1). 

The parameters assume that effects such as extinction and reddening of 
light by intervening dust have been accounted for. Maps exist of galactic 
extinction (e.g. Schlegel et al. 1998), although the correction is still a poten- 
tial source of error. Note again that the colours used in §4.4 have not been 
corrected, but the effect is small in this context. 

4.3 Results from Previous Studies 

Various quantitative classification studies using automated classification tech- 
niques have been carried out since the early 1980s. An important example, 
and one of the first to use neural networks, is Storrie-Lombardi et al. (1992). 
They used a feedforward backpropagation network with various configura- 
tions, including 13:13:5 (13 inputs, 13 neurons in first layer and 5 outputs), 
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shown in figure 4.1. They compared the outputs from this network with 
the classifications of human experts for 5217 galaxies from the European 
Southern Observatory catalogue (ESO-LV, Lauberts & Valentijn 1989). They 
found that the network agreed with the humans 64% of the time and agreed 
to within one class (of the 5) 90% of the time (table 4.2). The ESO-LV 
classifications were given in the catalogue itself and so were separate from 
the Storrie-Lombardi et al. project. The 64% compares with 56% from the 
ESO-AUTO automatic classifier ESO-AUTO, which used conventional lin- 
ear statistics as opposed to an ANN. Thus the ANNs showed a significant 
improvement. 




Figure 4.1: A network used by Storrie-Lombardi et al. (1992). This one is 
13:13:5. Others are similar. (From Storrie-Lombardi et al. 1992.) 

The ANNs also followed the distribution of types much better than the 
linear ESO-AUTO, again demonstrating their increased capacity to classify 
comparably with humans (figure 4.2). 

No human trials were carried out in this project, but a trial has been 
carried out by Nairn et al. (1995a). This study created a uniform sample of 
831 nearby bright galaxies from photographic plates from the APM galaxy 
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Table 4.2: Humans (ESO-LV) in rows, versus ANN and ESO-AUTO in 
columns, modified from Storrie-Lombardi et al. 1992. The diagonals give 
the classifications which agree, and amount to 64% of the total for the ANNs 
and 56% for ESO-AUTO. For classifications within one class, the agreement 
is 90%. 







Humans vs. 


ANNs 




Humans 


vs. ESO-AUTO 


Class 


E 


SO 


Sa+Sb 


Sc+Sd 


Irr 


E 


SO 


S 


a+Sb 


Sc+Sd 


Irr 


E 


203 


77 


25 


1 


5 


197 


87 




17 


5 


5 


SO 


109 


229 


240 


7 


2 


184 


218 




155 


28 


2 


Sa+Sb 


12 


85 


1281 


218 


15 


106 


12 




791 


664 


38 


Sc+Sd 


1 


4 


304 


415 


36 


22 


11 




24 


631 


72 


Irr 








53 


69 


126 


22 


9 




31 


42 


144 



2:iv 




Figure 4.2: Number of galaxies versus type, for the 13:13:5 ANN, ESO-LV 
and the linear ESO-AUTO (Storrie-Lombardi et al. 1992). Note that the 
origninal figure is also in grayscale. 
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survey (Maddox et al. 1990). Laser prints of the galaxy images were given 
to six human experts and they were asked to classify the galaxies according 
to the de Vaucouleurs T system (described in §2.3.1). (In fact one of the 
experts (van den Bergh) used computer screen images and the DDO system, 
later converted to T-type, but the difference is negligible). It was found that 
there was on average a root mean square (RMS) deviation of 1.8 T types 
between the experts. 

The mean square deviation (variance) between two observers used is given 

by 

4 = >< 5> - 

^9 al gal 

where N ga i is the number of galaxies classified by both observers. The RMS 
values between the experts in Nairn et al. (1995b) are shown in table 4.3. 
The higher values between the experts and the RC3 classifications of the 
same galaxies illustrates the usefulness of a uniform sample (the RC3 uses a 
different image set). Subjective classifications will depend on the appearance 
of the image to some degree. (RC3 = the Third Reference Catalogue of Bright 
Galaxies, de Vaucouleurs et al. 1991). 

Nairn et al. (1995b) then studied the classifications produced by various 
ANNs. The comparison is also described in Lahav et al. (1995). This study 
was the first and only systematic comparison based on a uniform sample of 
galaxy images, and presented to several experts from different 'schools of 
thought'. The results are in table 4.4. 



Table 4.3: Root mean square deviations between experts' classifications and 
between the experts and the RC3 catalogue (Lahav et al. 1995, table 2), RB 
= R. Buta; HC = H. Corwin; GV = G. de Vaucouleurs; AD = A. Dressier; 
JH = J. Huchra; RC3 = 3rd Reference Catalogue of Bright Galaxies (de 
Vaucouleurs et al. 1991) 



Classifier 


RB 


HC 


GV 


AD 


JH 


vdB 


RC3 


2.2 


2.1 


1.8 


2.3 


2.2 


2.4 


RB 




1.3 


1.6 


1.7 


1.8 


1.7 


HC 






1.5 


1.8 


1.9 


1.9 


GV 








1.7 


1.8 


1.9 


AD 










2.1 


1.8 


JH 












2.0 



CHAPTER 4. RESULTS 



43 



Table 4.4: RMS deviations between the experts and the ANN. The ANN is 
a 13:5:1 configuration with 10 runs averaged with different initial weights. 
Other configs. e.g. 13:13:1 gave similar results (Lahav et al. 1995 table 
3). 'Mean' = when the ANN is trained and run on the mean of the human 
types, with 'a few outliers removed.' It does not equal the mean of the other 
columns in the table. 





RB HC GV AD JH vdB Mean 


ANN 
N gal of 831 


1.9 2.0 2.2 1.9 2.3 2.2 1.8 
764 812 473 814 824 549 831 



This study finds that 'on the whole there is a reasonable consistency 
in the way people classify galaxies, but the scatter is significant' and that 
'the ANNs can replicate the expert's classification of the APM sample as 
well as other colleagues or students of the expert'. This suggests that the 
network was about as capable as the humans of classifying galaxies correctly 
using the T type system. Other statistics are also given, including a crude 
estimate of the internal scatter or reproducibility of the classifications of an 
indivdual observer using a dataset with a lower resolution. One can also plot 
the network versus itself with a different set of random weights, and perform 
various other tests. 

Many other studies have been made in which galaxies are morphologically 
classified by some kind of automatic system, including neural networks (e.g. 
Goebel et al. (1989), Thonnat (1989), Spiekermann et al. (1992), Doi et al. 

(1993) , Serra-Ricart et al. (1993), Abraham et al. (1994), Adams & Woolley 

(1994) , Odewahn et al. (1996), Nairn et al. (1997), Bazell & Aha (2001)), 
and many more still have looked at ways to quantify galaxy properties, but 
Nairn et al. (1995b)/Lahav et al. (1995) is the only systematic comparison 
between ANNs and a large number of independent experts. 

4.4 Project Results and Discussion 
4.4.1 Networks & Shimasaku Types 

Shimasaku et al. (2001) have obtained eyeball classifications for 456 galaxies 
from the Sloan Digital Sky Survey (SDSS) by having four of their experts 
compare the images with those of the Frei and Gunn galaxy catalogue (Frei et 
al. 1996), which are given RC3 types. In this project these eyeball types are 
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compared with the Matlab network results which used the parameters for 
the images (parameters to be published in Fukugita et al., in preparation). 
The Shimasaku et al. classifications are coarser than the T system but they 
consider them 'sufficient for most purposes for galaxy science'. They are 
designated 'Shimasaku Types' here, and are shown in table 4.5. 



Table 4.5: T-type versus type from Shimasaku et al. (2001). The name 
'Shimasaku Type' is adopted here for their types. 



Hubble Type 


EO 


SO 


SO 




Sa 


Sb 


T type 


-6 -5 


-4 -3 -2 


-1 





1 


2 3 


Shimasaku Type 





1 


1 




2 


3 


Hubble Type 


Sc 


Sdm 




Im 






Ttype 


4 5 


6 7 8 


9 


10 


11 


90 99 


Shimasaku Type 


4 


5 




6 




99.99 



Two galaxies, which had one or more Petrosian magnitudes of '99.99', 
were removed to leave a set of 454. (The 99.99 is so far above the other values 
of 15-20 that when these parameters were run on their own, as below, the 99 's 
seriously affected the results). Several other 'outliers' which had magnitudes 
of around 22 in the wrong magnitude column were left in, since most real 
data is likely to have outliers of this sort, due to imperfect photometry, and 
the networks could 'cope' with these. 

The parameters were initially run through a single linear neuron ('purelin' 
transfer function). This acts as a linear classifier and thus provides a 'control 
set' against which the improvements achieved by the networks can be seen. 
The parameters used are given in table 4.6. Note that a single apparent 
magnitude, r*, is included. This is not expected to correlate with galaxy 
type but may help by, for example, causing the network to give greater 
weight to bright galaxies, in which the parameters are better resolved, and 
thus perhaps more likely to show consistent patterns. The r* is used because 
the Petrosian magnitudes in all five bands are defined using r* so that all the 
bands can be observed using the same aperture. 

Typical combinations of parameters and results for the single neuron are 
given in table 4.7, for training and classification on the 454 galaxies. The 
results in 4.7 are arranged such that each subsequent set of parameters shown 
gave an improved result. For linear neurons, the error space has only one 
minimum, which is global. Hence there is no need to do several runs for 
different random initial weights. It is merely necessary not to set the learning 
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rate too high (which gives diverging weights), and to train for sufficient time 
that the value of the minimum becomes clear. In Matlab the linear neuron 
is always trained using the by weight and bias function 'trainwb', and no 
gradient value is output. Thus the number of epochs given in the table is 
that at which the fifth decimal place of the mean squared error (MSE, which 
is outputted) does not change for the first time, to the nearest 25 epochs. 
From the shape of the training curve (figure 4.3), one can see that the value 
of the global minimum is always clear at this point. 

Performance is 1 .23384, Goal is 




100 200 300 400 500 600 

675 Epochs 



Figure 4.3: A typical ANN training curve, showing the mean squared error 
between the network output galaxy type and the Shimasaku Type, versus 
number of epochs of training. For a single neuron, as shown here, there is 
only one minimum reached, which is global. More complex networks have 
more jagged training curves, but they retain the same form. The neuron here 
has clearly reached its minimum. 

From table 4.7, it is clear that the more parameters are used, the better 
the fit the network is able to make to the training set. However, it is also clear 
that fairly good fits can be achieved by using just one or two parameters, 
in particular, the (inverse) concentration index r 50 /r 90 (parameter 1) and 
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the g * — r* colour (parameter 5) provided good fits for these 454 galaxies. 
The importance of the concentration index is in agreement with numerous 
previous studies. Values in the table are given to 3 significant figures but 
probably only the first is important. Plots of network type versus true type 
for the concentration index, and for all 8 parameters are shown as figures 4.4 
and 4.5. 



O Results 
Types equal 




1 2 3 4 5 



Shimasaku Type 

Figure 4.4: Network type versus Shimasaku type for the single linear neuron 
using just the concentration index as input. Note the distortion at low and 
high Shimasaku type values. 

The mean squared error (MSE) and root mean square error (RMS) be- 
tween network type and Shimasaku Type is given for each parameter set. 
These RMSs between seven types can be compared with the Nairn et al. 
1995b RMS of 1.8 in 16 types. 1.8 in 16 is equivalent to 0.79 in 7. Thus the 
linear neuron on its own is not as good, as expected. 

The parameter set 12345678 was then tested with various actual 
neural networks. Details are given in table 4.8, and results in table 4.9. 
The results show that the larger networks are able to classify the galaxies 
essentially arbitrarily well. However, they are almost certainly 'overfitting' 
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Results 

i i i -p 
o Results 
Types equal 




1 2 3 4 5 6 

Shimasaku et al. 2001 eyeball values - = E 1=S0 2 = Sa 3 = Sb 4 = Sc 5 = Sdm 6 = Im 

Figure 4.5: Network type versus Shimasaku type for the single linear neuron 
using all 8 parameters as input. Runs from the actual networks in fact look 
very similar: improved correlation gradually moves the types towards the 
'types equal' line, moves most of the 6's up towards the same line, and gives 
a narrower spread about the line. By the time the correlation looks visually 
significantly better the network is probably overfitting the training set. 



Table 4.6: Parameters of the networks used in this project 



No 


Parameter 


1 


W r 90 


2 


PdeV 


3 


PExp 


4 


* * 

u -g 


5 


* * 

9 ~r 


6 


r* -i* 


7 


i* -z* 


8 


r* 
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Table 4.7: Parameters used and results for the single neuron. Compare the 
RMS column with the equivalent (to 1.8 T types in the 16 from -5 to 10) 0.79 
value for that between humans and between humans and ANNs in Nairn et 
al. (1995b). N.B.: MSE (not shown) = mean squared error between network 
type and Shimasaku type; RMS values are given to 2 d.p.; the number of 
epochs is that at which the first occurrence of the fifth decimal place of the 
MSE not changing occured. The RMS values shown are equal to y/MSE. 



Parameter (s) 


RMS (Shimasaku 


Correlation 


Maximum 


Number of 


used 


Types) 


Coefficient 


learning rate 


epochs trained 


3 


1.51 


0.238 


0.0043 


200 


7 


1.49 


0.285 


0.0042 


775 


2 


1.45 


0.354 


0.0042 


175 


4 


1.44 


0.373 


0.0012 


300 


2 3 


1.38 


0.451 


0.0042 


200 


6 


1.33 


0.510 


0.0038 


275 


5 


1.19 


0.643 


0.0029 


225 


1 


1.11 


0.698 


0.0038 


700 


1 2 3 


1.06 


0.731 


0.0037 


775 


1 5 


1.02 


0.752 


0.0027 


1275 


1 5 6 


1.01 


0.751 


0.0024 


1450 


12 3 5 


0.99 


0.769 


0.0026 


1750 


1 2 3 5 6 


0.98 


0.776 


0.0024 


1875 


12 3 4 5 6 


0.98 


0.777 


0.00099 


4075 


12 3 4 5 6 7 


0.97 


0.778 


0.00099 


4100 


12 3 4 5 6 7 8 


c. 0.95 


c. 0.791 


0.000017 


after 95300 
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the data. That is, the large networks have so many weights that they are able 
to 'remember' each galaxy individually rather than rely on generalisations. 
Thus they would be unable to cope when presented with a new set of galaxies. 
These galaxies would take a different path through the weight space and be 
classified essentially at random. 

The point to note is that the networks (larger and smaller), when only 
using some of the parameters, can equal the performance of the single neuron 
when it uses them all, and for the same parameter set improve on the single 
neuron, e.g. from 0.7± ~ 0.1 to 0.8± ~ 0.1 for just r 50 /r 90 . It is unfortunate 
that when they significantly improve by using all the parameters they are, 
with this training set, overfitting the data. The networks are also much 
quicker to train than the single neuron when using 8 or more parameters, as 
can be seen in tables 4.7 and 4.9. 

There was found to be very little difference between the performance us- 
ing sigmoid and using tanh transfer functions. Sigmoids were chosen because 
they map positive inputs onto an output between to 1, and with the excep- 
tion of a few 'outlier' colours, the galaxy parameters are all positive. Because 
normalisation was not carried out, the outputs were always kept as pure lin- 
ear so that they could range from to 6 (the Shimasaku types), as opposed 
to the to 1 of a sigmoid. The Levenberg-Marquardt algorithm converged 
more quickly in small networks, but was unsuitable for large networks and 
sometimes failed. One can also see from table 4.9 that adding more neurons 
in a layer is generally more beneficial than adding more layers. The training 
times are given. The time taken to classify the 454 galaxies varies from about 
8 to 12 seconds depending on the complexity of the network. 

Ideally the means should be from a larger number of runs than five, but 
the standard deviations are shown in the table for the MSE's and correlations, 
and confirm that the more complex networks do indeed produce a lower MSE. 
The standard deviations could also be calculated for table 4.7. 

A listing from which all the networks shown here can be generated is 
shown in Appendix B. 

4.4.2 Application to the SDSS Early Data Release 

The networks were then applied to the Sloan Digital Sky Survey Early Data 
Release (SDSS EDR, Stoughton et al. 2001). A set of galaxies was selected 
from the catalogue archive server (http://archive.stsci.edu/sdss/so- 
ftware/) using the SDSS Query tool and the query 

SELECT ra, dec, z, zErr, zConf, zStatus, tag.petroMag, 
tag. reddening, tag.petroR50_r, tag.petroR90_r , tag.lExp_r, 
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Table 4.8: Summary of network details. Note: the architectures are S:l 
etc. where S = no. of neurons in the layer. The number of inputs and 
outputs are not shown (always 8 and 1 respectively). Transfer functions: 1 
= sigmoid, p = pure linear, t = tanh; training: lm = Levenberg-Marquardt, 
rp = resilient backpropagation; those unmarked are 'lp_rp'. An example: 
8:l_tp_rp is 8 neurons in the first layer, one in the second (as output), the 
transfer functions are tansig, purelin and the training algorithm is resilient 
backpropagation. 



Architectures 



Initial weights 



Trained on 
Training algorithms 
Learning rate 

Other parameters 



l_p, 8:l_tpJm, 8:l_tp_rp, 8:Up_rp, S:l and S:S:1 
where S=8, 16, 32 and 64, (except 64:64:1), 
128:128:128:128:1 

Here in fact all 1 for p_l (better is zero but 
makes no difference to the global minimum), 
randomised by Matlab network initialisation 
for other architectures 
454/454 

p_l: trainwb, rest: trainlm and trainrp 

p_l: maximum without being weights diverging, 

rest: default 

Matlab defaults (see Appendix B) 
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Table 4.9: Results for various network architectures when applied to the 454 
galaxy training set using all 8 parameters. Each architecture was trained for 
1000 epochs. As in table 4.7, compare the RMS (again = V MSE) column 
with the equivalent (to 1.8 T types in the 16 from -5 to 10) 0.79 value for that 
between humans and between humans and ANNs in Nairn et al. (1995b). 
N.B.: Transfer functions: 1 = sigmoid (logsig), p = pure linear, t = tanh; 
training: lm = Levenberg-Marquardt, rp = resilient backpropagation; those 
unmarked are 'lp rp'. The architecture notation is the same as in table 
4.8, where it is explained. The standard deviations are sample standard 
deviations. Also note: [1] The trainlm failed twice in the 5 runs, so the 
result here is from 3; [2] the 128:128:128:128:1 was only run once, and simply 
shows an extreme example of overfitting. 



Network 


RMS & Stdev from 5 
runs (Shimasaku Types) 


Correlation 
Coefficient 


Time to train for 
1000 epochs 


1 


1.11 





0.698 


0:08 


8:1 tp rp 


0.869 


0.0374 


0.828 


0:28 


8:1 tp lm 


0.843 


0.0276 [1] 


0.836 


1:22 


8:1 lp rp 


0.847 


0.0222 


0.844 


0:28 


16:1 


0.808 


0.0327 


0.857 


0:37 


8:8:1 


0.795 


0.0328 


0.858 


0:44 


32:1 


0.771 


0.0358 


0.867 


0:55 


8:8:8:1 


0.764 


0.0775 


0.869 


0:58 


64:1 


0.703 


0.0089 


0.891 


1:21 


16:16:1 


0.698 


0.0390 


0.893 


0:58 


16:16:16:1 


0.622 


0.0658 


0.915 


1:19 


32:32:1 


0.558 


0.0627 


0.932 


1:40 


64:64:1 


0.516 


0.0168 


0.943 


3:06 


32:32:32:1 


0.444 


0.0164 


0.958 


2:18 


128:128:128:128:1 


0.106 


n/a [2] 


0.998 


c. 20 min 
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tag . lDeV_r , tag . ob j . f ield . segment . run , 

tag. obj . field. segment . rerun, tag. obj . field. segment . camCol , 
tag. obj .field. field, tag. obj .obj id, tag.rowC, tag.colC 
FROM specobj 

WHERE (primTarget == AR_TARGET_GALAXY) 

This returned a set of 29,433 galaxies with spectra and redshifts. These 
were chosen a) to provide a reasonably sized dataset and b) so that the 
distribution of the galaxies could potentially be plotted by the morphological 
type determined by the networks. Four galaxies were removed from the 
set, because they had values of r 50 and r 90 equal to zero (hence r 5 o/r 90 is 
undefined), leaving a set of 29,429. 

It was found that, as suspected, the large networks probably were overfit- 
ting the Shimasaku et al. training set, since the outputs on running them on 
the SDSS EDR set were clearly incorrect (e.g. all the same type). Thus the 
simpler networks were run, using just the concentration parameter, and gave 
the results in table 4.10. Again further runs should be performed to get a bet- 
ter set of averages, but the overall observation that the mean percentages of 
elliptical, spiral and irregular galaxies are in the ratio 14±5 : 86±12 : 0±0.1 
compares reasonably with the observed values of approximately 20:80:1 at 
low redshift. (The galaxies in this SDSS EDR set have a mean redshift 
z = 0.1034 ± 0.00011; the ±5,12 and 0.1 come from the sum of errors 
a 2 = o\ + o 2 B + . . . where a is the absolute error.) The more detailed types 
in the table are distorted at each end (types and 5, 6 are rarely output), 
because the use of just the concentration parameter causes the distortion 
seen in figure 4.4. 

4.4.3 Further Steps 

The next step should be to train the various networks on a subset of the 
Shimasaku data to assess and hopefully quantify which ones are overfitting 
the data, and then use the best ones which are not doing so on the EDR set. 
On the other hand, maybe a larger training set is needed, since the Shimasaku 
set is from the commissioning data, which, like the EDR, is from a narrow 
band of sky. Thus large scale structures in the region could distort the 
percentages of galaxy types. (Clusters have a higher percentage of ellipticals 
and SO's, up to 50% or more than the mean, and more sparsely populated 
regions (parts of walls which are not clusters, filaments and voids) have more 
spirals.) 

Other steps would be to vary the learning rate for the actual networks 
as opposed to just the single neuron (each input weight, layer weight and 
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Table 4.10: Results from applying the networks to the SDSS Early Data 
Release. Each network type is run using just the concentration parameter 
from the 454 Shimasaku et al. set to train. Each is then run (training 
and classification) on the first 100 galaxies of the SDSS EDR set 10 times, 
and an average taken, and on the full set once. For a trained network to 
classify the 29,429 galaxies of the full set takes around 10 minutes, as shown. 
The networks are all sigmoid with linear output and trained using resilient 
backpropagation. The single linear neuron is trained for 675 epochs (where 
its MSE 5th decimal place stayed the same for the first time, as above); the 
rest were trained for 1000 epochs. 



Network 


Run 


MSE 






Number of each 
















Shimasaku Type 












EO 


SO 


Sa 


Sb 


Sc 


Sdm 


Im 











1 


2 


3 


4 


5 


6 


1 


100 


1.23384 





12 


47 


27 


12 


2 





8:1 




1.10838 





12 


33 


40 


15 








8:8:1 




1.04274 





13 


29 


48 


11 








64:1 




1.03261 





5 


36 


43 


16 








1 


29429 


1.23384 


48 


5272 


11658 


8603 
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bias rate must be explicitly set using matrices of the correct connectivity 
net.inputConnect, net. layer Connect or net.biasConnect (see Appendix B)), 
and to include the galactic extinction, since the parameters is given in the 
SDSS EDR set. 

Further possibilities are discussed in Chapter 5. 



Chapter 5 

Extensions to the Project 



There are many possible ways to extend the project, from specific further 
work which could be done to more general aims which the community is 
working towards. 

5.1 Further Networks and Data 

The very large number of possibilities on MATLAB and on other available 
programs have not been exhausted by this project. Some possibilities were 
described in §3.7. Many further network programs are available on the web. 
The FAQ site for the newsgroup comp.ai. neural-nets lists, as of February 1, 
2008, 40 commercial programs and 43 shareware programs, plus numerous 
source code locations in its parts 5 and 6. The site http : //www . emsl . pnl . g- 
ov:2080/proj /neuron/neural/systems/shareware. html lists 62 commer- 
cial and 59 shareware. 

Other examples of programs used for galaxy classification (URLs in ref- 
erences) include LMorpho, and Autoclass (Cheeseman & Stutz 1996, Goebel 
et al. 1989). There are also many dedicated codes written by research groups 
for their particular projects. 

Further data would include the digital sky surveys (SDSS, 2dF, etc.), the 
scanned photographic plates (APM, SuperCOSMOS (Hambly et al. 2001)), 
and the previously classified galaxy atlases (ESO-LV, RC3, etc.). Of these 
the new surveys would be the most useful, because of their unprecedented 
amount, quality and consistency of data. 
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5.2 Further Morphological Schemes 

The networks here have produced Shimasaku types as output. These are 
based on Hubble's scheme. Clearly other schemes, such as the Yerkes, DDO 
etc. could be used. However, expert human eyeball classification for a large 
galaxy sample by a group of independent human experts has been, and only 
could have been, performed using Hubble's scheme. So other schemes would 
require checking of a network's reliability by humans, to make sure that it is 
correctly applying the scheme. 

5.3 Spectral Schemes 

Classifying galaxies spectroscopically as opposed to morphologically would 
be the most important general extension to the work in this project. Indeed, 
in the choices presented as MSc projects it was a possible alternative to this 
one. Essentially similar work could be carried out. Much literature suggests 
that spectral classifications are the most meaningful in terms of parameters 
which can be observed and predicted by numerical or semi-analytic galaxy 
formation models, and by similar models of galaxy evolution. However, pa- 
rameters such as the concentration index are equally important and so the 
best classification systems will probably use both spectral and morphological 
parameters. This project uses galaxy colours, which are basically integrated 
stellar spectra, and so could be called spectral. However, the concentration 
index has been the most significant parameter, hence the 'morphological' in 
the project title. 

5.4 Unsupervised Learning 

The two schemes above, and as many meaningful parameters as possible 
could be fed in (not necessarily all at once) to an unsupervised network to 
see what patterns it comes up with. The two unsupervised learning methods 
of nonlinear principal component analysis and the Kohonen Self Organising 
Map could be used. An example of the latter is Nairn et al. (1997), who 
investigate patterns in parameters for galaxies at moderate redshift. 

5.5 Other Wavebands and 'Channels' 

The galaxies here were imaged at optical wavelengths. The appearance of 
galaxies alters considerably at other wavelengths (e.g ultraviolet, infrared), 
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and this could be investigated. An example is the Block et al. (1999) result 
that that the appearance of galaxies at near infrared wavelengths does not 
correlate with Hubble type, because galactic dust is much more transparent 
at these wavelengths. 

The electromagnetic spectrum is only one astrophysical 'channel' for 
which information is available from the sky. Others include neutrinos, grav- 
itational waves or dark matter. Of these, only neutrinos have been detected 
directly (and then with great difficulty), but each is present in galaxies so 
should eventually be explained. Indeed, the 'theorist's view' of a galaxy is 
often that of a large approximately spherical halo of dark matter with a small 
smudge of luminous matter somewhere in the middle! (The dark matter con- 
tains most of the mass, which is important in the theory and simulation of 
galaxy and large scale structure formation.) 

5.6 Galaxy Evolution 

The galaxies in this project are essentially at redshift zero (i.e. at z ~ 0.1), 
so the effects of galactic evolution are small. Galaxies at higher redshifts 
could be investigated, and indeed have been. An example is the detailed 
investigation of galaxy morphologies within the Hubble Deep Field (Abraham 
et al. 1996), and many papers have been published suggesting quantitative 
schemes for extending the Hubble sequence to moderate and high redshifts 
(e.g. Abraham & Merrifield 2000). The Hubble sequence has been shown to 
be inadequate in describing galaxies at redshifts where evolution is significant. 
Parameters such as the concentration index are useful because they broadly 
correlate with galaxy type and can be measured to much greater distances 
than can detailed morphology. 

5.7 Unusual Galaxies 

The Hubble scheme was designed to include bright, nearby, clearly spiral or 
elliptical galaxies. However, active galaxies (active galactic nuclei, quasars), 
interacting galaxies, and the vast numerical majority of small dim galaxies 
(dwarf ellipticals, dwarf spirals, most irregulars etc., which are therefore not 
at all 'unusual') are not included. It should be possible to understand these 
in terms of the parameters the classification schemes show to be meaning- 
ful. Other galaxies have recently been detected solely from the 21cm line of 
neutral hydrogen (HI). 
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5.8 Further Possibilities 

The combination of accurate galaxy types and the breadth, depth and consi- 
tency of coverage from the Sloan survey would allow many aspects of galax- 
ies and large scale structure to be investigated in unprecedented detail. For 
example, the statistics of the structure could be analysed as a function of 
galaxy type (e.g. clustering strength, luminosity function). These sorts of 
observations would provide a detailed set of data to constrain simulations 
and semi-analytic theories, e.g. Benson et al. (2001) quote Sloan for testing 
z = and the VIRMOS survey for high redshifts. The next data release for 
the SDSS is scheduled for January 2003, and the full dataset of 50 million 
galaxies, 1 million of which will have associated spectra and hence redshifts, 
should be available by 2005. 



Chapter 6 
Conclusions 



The main findings of this project are: 

• Automated classification systems are essential to be able to classify the 
galaxies in the new digital sky surveys. The largest of these, the Sloan 
Digital Sky Survey (SDSS), will collect a full dataset of 50,000,000 
galaxies in n steradians of the northern galactic cap by 2005. Other 
million-plus datasets exist on photographic plates. 

• The trained networks only perform classification of the galaxies that 
they have not seen into broad categories, but the intrinsic spread in 
galaxy properties means that this is sufficient for most purposes. How- 
ever, with the sheer numbers of galaxies becoming available, what is 
'spread' and what is real variation may be distinguishable. 

• The correlation between network type and type found by the human 
classifiers in Shimasaku et al. (2001) for 454 galaxies from the SDSS 
commissioning data was around 0.7 ± 0.1 for a linear classification 
method (single linear neuron) using just the concentration index, and 
improved to around 0.8 ±0.1 for simple networks. These figures also 
improve by another 0.05 for simple networks compared with the single 
neuron for the same set of parameters. Larger networks using all the 
parameters were able to produce correlations arbitrarily close to 1, but 
these were overfitting the particular training set and would not be able 
to cope when presented with previously unseen galaxies. A simple net- 
work using the most significant parameters (concentration index, one 
or two colours, and perhaps the profiles) is probably best. 

• Simple networks trained on the Shimasaku et al. (2001) galaxy set 
were briefly applied to a set of 29,429 galaxies with spectra from the 
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SDSS Early Data Release. They gave mean Shimasaku type percent- 
ages corresponding to an elliptical/lenticular:spiral:irregular ratio of 
14 ± 5 : 86 ± 12 : ± 0.1 These are averages from only a small amount 
of runs, but compare with the observed ratios of ~ 20:80:1 for galaxies 
at low redshift. The SDSS data is from a narrow strip of sky, and the 
networks distort the ends of the distribution, because just the r 50 /rg 
concentration parameter was used, as figure 4.4 showed. 

• Morphological galaxy parameters are only a subset of those impor- 
tant for describing galaxies. The most important parameters are those 
which relate directly to galaxy evolution. These can then be compared 
with predictions from semi-analytic galaxy formation models. These 
models connect with the present through models of galaxy evolution, 
and observations of galaxies at high, medium and low redshift. See e.g. 
Baugh et al. (2001). Thus the morphological concentration parame- 
ter, and the strongly correlated bulge-to-disk ratio are important, as 
are spectral parameters such as integrated galaxy colour, line strengths 
and estimated star formation rates. Morphology can often vary sig- 
nificantly when there is not much physical variation (e.g. the rings 
and S's in galaxy disks which were used in the de Vaucouleurs 3D sys- 
tem). Spectral parameters tend to be more correlated with physical 
parameters. 

• Supervised networks can never classify galaxies better than the set on 
which they are trained. This set can be a slight improvement over 
humans if the set is the mean of several observers, but it will still 
inevitably be subjective to some extent. Unsupervised classification 
using principal component analysis (PCA), and a combination of mor- 
phological and spectral parameters, is the ideal. Some authors, e.g. 
Nairn et al. (1997) suggest that it may be better simply to think of 
distributions in parameter space rather than performing classifications. 
This can then be done at any redshift. 

• In the time available for the project (November 2000- August 2001, full 
time from June 2001), only a small fraction of the available possibilites 
could be tried out. Important extensions to the work would be to firstly 
more fully test the training set available, base the quoted averages on 
more network runs, and then look in more detail at the SDSS Early 
Data Release. Other types of network could then be tried, for example 
unsupervised with PCA, and networks using other other parameters 
and classification schemes, e.g. spectral. One could then look at galax- 
ies in other wavebands, in particular the infrared where dust in galaxy 
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disks is almost transparent. Finally galaxies at moderate and high red- 
shift, where evolution has become significant, must be described in a 
quantitative way by any complete classification scheme. 

• The most significant conclusion of the project is probably that the 
tools are available to create any number of neural networks at a level 
of sophistication used in the artificial intelligence community, and not 
just by astronomers. The MATLAB Neural Network Toolbox is one 
example of such a toolset, and it is available in a standard form to 
all. High performance computers would enable more runs and shorter 
training times than those quoted here. The ideal galaxy classification 
system would then use the principal component analysis also available 
in the toolbox, feed the components into an unsupervised network, and 
classify the galaxies in the SDSS dataset as they become available. The 
components would probably be directly related to parameters used to 
describe galaxy evolution, some of which are morphological and some 
spectral. Trying out all the possibilities in Matlab would form a 
worthwhile PhD or research project. 
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Appendix B 
Matlab ANN Listing 



This is a sample Matlab Neural Network Toolbox listing, written from 
scratch by the author. Networks used in this project can be obtained by 
altering lines in the listing, as explained in the comments. The % signs 
indicate the comment lines, which are ignored by Matlab when the file is 
run. 



%************************* *********************** 3| 

% 

'/.'/.Classif ygeneral : 

•/.•/. 

'/XTrained on R input parameters from 'params.dat' 

'/."/.file used by Shimasaku et al. 2001: 

•/.'/. 

'/.'/.Parameters are combinations of : 

'/,'/,r50/r90 , P_DeV, P_Exp, u*-g*, g*-r*, r*-i* , i*-z*, r* 
'/.'/. 

'/.'/, (see tables 5.6 and 5.7) 
'/."/. 

'/.'/,1 output of galaxy Shimasaku Type to 6 (see Shimasaku et al.) 
'/.'/. 

'/.'/.Shimasaku Types are eyeball classifications, or 'true' /Shimasaku types 

'/.'/.Network outputs a corresponding estimate or network type for each galaxy 

'/.'/.All 454 galaxies are used for training 

'/.'/.(otherwise only a subset of the information is used) 

'/.'/. 

'/.'/.Can also read and classify the file 'spectra2.dat', corresponding 
'/.'/.to the 29,429 SDSS Early Data release galaxies, once trained on 
'/.'/.the ones from Shimasaku et al . 

•/:/. 

yo°/ Uncomnient desired network 

•/.•/. 

'/.'/. '/. = line is commented out 
'/.'/. '/.'/. = line is a comment 
'/.'/. 

'/.'/.In this file the network is currently set up to run on parameters 1, 2 and 3, 
'/.'/, (r50/r90, P_DeV, P_Exp, using a single linear neuron, set to zero initial 
'/.'/.weights and bias 
'/.'/. 

'/.'/.Last update: 27/08/01 Mon 
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% 

y 

'/."/.Load 'params.dat' into a 456 by 24 matrix 'params' 

'/."/.The file must be present in the current directory when Matlab is run 
'/."/.Each row is one galaxy 

y 

load params.dat 

y 

'/."/.Set network architecture : 
'1.1 

'/."/.Choose an architecture and type the correct 
'/.'/.network for the run number, as described 
'/.'/. 

'/.'/.Note that single 'hardlim' (perceptron) , 

'/,'/, 'logsig' and 'tansig' neurons are not used, because: 

'/.'/. 

'/.'/.Hardlim is insufficient because the output is or 1 and a continuous output 

'/.'/.of to 6, or the normalised equivalent, is required 

'/.'/. 

'/.'/.Logsig and tansig are not correct for the output layer, which they form 
'/.'/.in the case of a single neuron, because they would distort the output 
'/.'/.and compress it to between and 1 and -1 and 1 respectively 
'/.'/. 

y 

y 

'/.'/.Single linear neurons: 

y 

'/.'/.Type 'net = newlin([0 1;0 1; ... ],1) with R [0 l]'s, 
'/.'/.one for each of the R input galaxy parameters 
'/.'/.The 1 is nominally the input range but it did not 
'/.'/.make any difference to the results using any range 

net = newlin([0 1;0 1;0 



y 

'/.'/.Feedforward backpropagation networks: 
'/.'/. 

'/.'/.Purelin is always used as the output layer transfer 

'/,°/,f unction so that the output is not distorted 

'/,°/,S:S:l can be generalised to S:T:1, etc. 

'/.'/.but this does not significantly improve performance, 

'/.'/.and requires further sets of layer weights 

y 

'/,'/,S:l multilayer networks: 

'/.'/.Type 'net =newff([0 1; 1; ... ] [S, 1] { 'fen' , 'purelin' }) 

'/.'/.where there are R [0 l]'s, S neurons in the first layer 

'/.'/.(e.g. 8 but important to try others), and 'fen' is the transfer function 

'/.'/.(hardlim, purelin, logsig or tansig) 

'/. S=8; 

'/, net = newff([0 1;0 1;0 1] , [S, 1] , {'tansig' , 'purelin'}) 
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'/,'/,S:S:l multilayer networks: 

'/."/.Type 'net =newff([0 1; 1; ... ] [S,T, 1] {'f en' , 'f cn' , 'purelin'}) 
7. S=8; 

'/, net = newff([0 1;0 1;0 1] , [S,S,1] .{'tansig' , 'tansig' , 'purelin'}) 

% 

'/."/.Initialise network here so that any settings 
'/."/.set below override a clean network 

'/."/.If the weights are explicitly set as opposed to randomised 
'/.'/.at the start, (which is what Matlab does otherwise) 
'/.'/.this then makes the net give the same results for the same 
'/.'/.input parameters and settings each time, as it should 

'/.'/.(Initialisation just before the training section below does not do this) 
t 

net=init (net) ; 



'i 

'/.'/.Set network preferences, including weights and biases 
'/.'/.Values not explicitly set in the file are defaults (D) 
'/.'/. 

'/.'/.The set of specifications given here is long but does 
'/.'/.completely specify any network used in this project 
'/.'/. 

'/.'/.They can be viewed in Matlab by typing their names, 

'/.'/.when a network has been created 

'/.'/. 

'/.'/.Here " = same as line above 

'/.'/. 

7.7. 

'/.'/. 

'/.'/.Architecture : 
'/.'/. 

'/.'/.Settings allow more general connections between the neurons, 
'/.'/.e.g. last layer to first, etc. 
'/.'/.These are not employed here 



•/.'/. 








'/.'/.net 


.numlnputs 




- set by 'newlin', 'newff, etc. 


'/.'/. 






i.e. network creation 


'/.'/.net 


. numLayers 




_ it 


'/.'/.net 


. biasConnect 


D 


- bias set so one connects to each layer: 


'/.'/. 






[1] for 1 layer, [1;1] for 2, 


'/.'/. 






[1;1;1] for 3, etc. 


'/.'/.net 


. inputConnect 


D 


- inputs connected to input layer only: 


'/.'/. 






[1], [1;0], [1;0;0], etc. 


'/.'/.net 


. outputConnect 


D 


- outputs conected to output layer only: 


'/.'/. 






[1] , [0 1], [0 1], etc. 


'/.'/.net 


.targetConnect 


D 


- targets connected to output layer only: 


'/.'/. 






[1] , [0 1], [0 1], etc. 


'/.'/.net 


.numOutputs 


D 


- always 1 


'/.'/. 






(i.e. 1 galaxy type output by network) 


'/.'/.net 


.numTargets 


D 


- always 1 (one Shimasaku Type per galaxy) 


'/.'/.net 


. numlnputDelays 


D 


- always (no time dependence in network) 


'/.'/.net 


. numLayerDelays 


D 


_ it 
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•/:/. 
•/:/. 



'/.'/.Subobject Structures: 


%•/. 






'/."/.net 


. inputs{i} 




'/."/.net . 
•/.'/. 


, inputs{i} . 


range 


'/.'/.net . 

•/:/. 


. inputs{i} . 


size 


'/.'/.net . 


. inputs{i} . 


userdata 


'/."/.net . 


, layers{i} 




'/.'/.net 
'/.'/. 


. layers{i} . 


dimensions 


'/."/.net . 
•/.•/. 


, layers{i} . 


distanceFcn 


'/.'/.net 


. layers{i}. 


distances 


'/.'/. 






'/.'/.net 


, layers{i}. 


initFcn 


'/.'/.net 


. layers{i}. 


netlnputFcn 


'/.'/. 






'/.'/. 






'/.'/.net 


, layers{i}. 


positions 


'/.'/. 






'/.'/.net 


. layers{i}. 


size 


'/.'/. 






'/.'/.net 


. layers{i} . 


topologyFcn 


'/.'/.net 
'/.'/. 


. layers{i} . 


transf erFcn 


'/.'/.net . 


, layers{i} . 


userdata 


'/.'/.net 


. outputs{i} 


'/.'/.net 


. outputs{i} . size 


'/.'/.net . 


, outputs{i} .userdata 


'/.'/.net . 


,targets{i} 


'/.'/.net 


.targets{i} . size 


'/.'/.net . 


, targets{i} .userdata 


'/.'/.net 


,biases{i} 




'/.'/.net 


.biases{i}. 


initFcn 


'/.'/.net 


.biases{i}. 


learn 


'/.'/.net 


,biases{i}. 


learnFcn 


'/.'/.net 


,biases{i}. 


learnParam 


'/.'/. 






'/.'/.net 


,biases{i}. 


size 


'/.'/.net 


,biases{i}. 


userdata 



'/.'/.net . inputweights{i , j } 
'/.'/.net . inputweights{i , j } 
'/.'/. 

'/.'/.net . inputweights{i , j } 
'/.'/. 

'/.'/.net . inputweights{i , j } 
'/.'/.net . inputweights{i , j } 
'/.'/.net . inputweights{i , j } 
'/.'/.net . inputweights{i , j } 
'/.'/.net . inputweights{i , j } 
'/.'/.net . inputweights{i , j } 
'/.'/.net . layerweights{i , j } 
'/.'/.net . layerweights{i , j } 
'/.'/.net . layerweights{i , j } 
'/.'/. 

'/.'/.net . layerweights{i , j } 
'/.'/.net . layerweights{i , j } 
'/.'/.net . layerweights{i , j } 



.delays 

. initFcn 

. learn 
. learnFcn 
. learnParam 
. size 
.userdata 
.weightFcn 

.delays 
. initFcn 

. learn 
. learnFcn 
. learnParam 



- i=l, 2, etc., one set for each layer 

- matrix of minimum and maximum values 
for inputs (set in network creation) 

- number of inputs 

(set in network creation) 
D - (notes) 

-S, S, ... 1, i.e. number of neurons 
in each layer (set in network creation) 
D - dist: function to apply weights to an 
input to give weighted inputs 

- S by S matrix for each layer 
(set in network creation) 

D - initialisation function for the layers 
D - netsum: function to calculate the overall 

input to a layer by combining the 

weighted and biased inputs 

- [0 1 2 3 4 . . .] to S-l for each layer 
(set in network creation) 

- S S ... 1, i.e. number of neurons 

in each layer (set in network creation) 
D - 'hextop' (not used) 

- hardlim, purelin, logsig or tansig 
for each layer 

D - (notes) 

- set by net . outputConnect 
D - (notes) 

- set by net .targetConnect 
D - (notes) 

D - (none) specific initialisation for biases 
D - 1 

D - (depends on training algorithm) 

- lr, other learning functions 
have extra parameters 

- [5], [1] 
D - (notes) 

- one set for each connection 

D - (none) network has no time dependence, 

i.e. it is static 
D - (none) specific initialisation 

for input weights 
D - 1 

D - (depends on training algorithm) 

- lr, etc. 

- [S R] size of input weight matrix 
D - (notes) 

D - dotprod (scalar product) 

D - (none) network is static 

D - (none) specific initialisation 

for layer weights 
D - 1 

D - (depends on training algorithm) 

- lr, etc. 
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'/.'/.net . layerweights{i , j } . size 
'/.'/. 

'/.'/.net . layerweights{i , j } . userdata D 
'/.'/.net . layerweights{i , j } . weightFcn D 
'/.'/. 

'/.'/.Functions: 
'/.'/. 

'/.'/.net . adaptFcn D 
'/.'/. 

'/.'/.net . initFcn D 
'/.'/.net . perf ormFcn 
'/.'/. 

'/.'/.net . trainFcn 
'/.'/. 
'/.'/. 

'/.'/.Parameters : 
'/.'/. 

'/.'/.net . adaptParam D 
'/.'/. 
'/.'/. 

'/.'/.net . initParam D 
'/.'/.net .perf ormParam D 
'/.'/.net . trainParam . epochs 
'/.'/.net. trainParam. goal D 
'/.'/. 

'/.'/.net . trainParam . lr 
'/.'/. 

'/.'/.net .trainParam. max_f ail D 
'/.'/.net . trainParam . min_grad D 
'/.'/. 

'/.'/.net . trainParam . show D 
'/.'/. 

'/.'/.net .trainParam. time D 
'/.'/. 
'/.'/. 

'/.'/.Weight and bias values: 
'/.'/. 

'/.'/.net.IW{i,j} 
'/.'/. 

'/.'/.net.LW{i,j} 
'/.'/. 

'/.'/.net.Mi} 
'/.'/. 
'/.'/. 

'/.'/.Other : 
'/.'/. 

'/.'/.net .userdata D 
'/.'/. 

'/. 



[T S] size of layer weight matrix 
between layers with S and T neurons 
(notes) 
dotprod 



adaptwb (used for incremental as 
opposed to batch training) 
(depends on training algorithm) 
mse (mean square error between network 
output galaxy type and Shimasaku Type) 
e.g. trainrp 

(see 'Set Training Parameters' below) 



.passes (1 pass through network is the 
default before the learning algorithm 
is applied) 
(none) 
(none) 

set below: number of epochs to train for 
default is mse=0, usually replaced 
by .epochs or .min_grad 
learning rate: set for single neurons, 
left on default 0.01 for networks 
used in 'early stopping' training 
gradient value at which to stop training 
at which to stop training (no value set) 
show mse etc . every . show number 
of epochs (default is 25) 
stop training after certain 
processor time (default is infinite) 



can be set below, with indices 
corresponding to net . inputConnect 
can be set below, with indices 
corresponding to net . layerConnect 
can be set below, with indices 
corresponding to net .biasConnect 



(notes) 



% 

'/.'/.Set other preferences to zero to clear previous run 
'/.'/.And show that they are zero 

'/.'/.If some (e.g. bias3) are not used then they are shown as zero again below, 
'/.'/.as opposed to causing an error by being undefined 
% 



iw=0 
lw=0 
bias=0 
bias2=0 
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bias3=0 

lr=0 

ep=0 

y 

'/.'/.Ask for, or leave to default: 

•/:/. 

'/,'/,lr = learning rate (as high as possible without diverging weights) 
'/.'/.ep = number of epochs (so that MSE asymptote is approximated) 

•/:/. 

y 

lr=input ( 'Learning rate (e.g. 0.001): '); 

ep=input ( 'Number of training epochs (e.g. 1000): '); 

y 

'/."/.Set weights and bias 
'/.'/. 

'/.y.Uncomment set for appropriate run (table 4.7) and network architecture 
y 

'/.'/.To explicitly set the weights and biases and stop them being randomised 

'/.'/, (most useful when doing quick comparison 

'/.'/.of parameter combinations with a single linear neuron) 

'/."/.Single neurons (no layer weights lw) 

'/.'/.Type iw=[x y z ...]; with R values for R inputs 

'/.'/.Usually x=y=z=...=0 as a starting point 

'/.'/.No layer weights are set 

iw= [0 0]; 

lw=[]; 

'/.°/.S neurons in first layer 

'/.'/.Type iw=[xl yl zl . . . ;x2 y2 z2 ...] with R values 

'/.'/.between each semicolon and S sets of values 
'/.'/.Again xl=x2=yl=y2= ... =0 is usual 

'/.'/.Currently S=8 (value of T irrelevant for input weights) 
'/. iw=[0 0;0 0;0 0;0 0;0 0;0 0;0 0;0 0] ; 

'/.'/.Layer weights for networks with S neurons in first layer 
'/.'/.Type lw=[x y z ...] for S neurons in each hidden layer 
'/.'/.(i.e. layers which are not the output layer) 

'/. lw= [0 0] ; 



'/.'/.Biases 

'/.'/.The arrays with more than one zero require S zeros, 
'/.'/.each separated by semicolons 

'/.'/.Uncomment the appropriate set and change if necessary 
'/.'/.Single neuron 



bias= [0] ; 
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•/.°/.S:l 

'/. bias=[0;0;0;0;0;0;0;0] ; 
'/. bias2=[0] ; 

•/.'/.S:S:1 

'/. bias =[0;0;0;0;0;0;0;0] ; 
•/. bias2= [0 ; ; ; ; ; ; ; 0] ; 
'/. bias3=[0] ; 

y, 

'/.'/.Convert the iw, lw and bias values to the network notation 

'/."/.Ones which are zero are weight combinations which are not used, 

'/.'/.e.g. from 2nd layer to first, etc. 

'/.'/.They are shown for completeness (the '/,'/, lines) 

'/.'/.Uncomment appropriate set 

% 

'/.'/.Single neuron 

net.IW{l}=iw; 
net.b{l}=bias; 



'/.'/.Multiple neurons 

'/. net.IW{l,l}=iw; 
'/.'/,net.IW{2,l}=0; 

'/. net.LW{2,l}=lw; 
'/.'/.net.LW{l,l}=0; 
'/.'/,net.LW-(l,2}=0; 
'/.'/.net.LW{2,2}=0; 

'/. net.b{2}=bias2; 

t 

'/.'/.Set learning rates for weights and biases (all equal to lr, set above) 
% 

'/.'/.Single neuron 

net . inputWeights{l} . learnParam. lr=lr ; 
net .biases{l} . learnParam . lr=lr ; 



'/.°/,S:l configuration 

'/, net . inputWeights{l , 1} . learnParam. lr=lr; 
'/, net . layerWeights{2 , 1} . learnParam. lr=lr; 
'/.'/.Rest are again zero 

'/, net .biases{l} . learnParam . lr=lr ; 
'/, net .biases{2} . learnParam . lr=lr ; 

'/.'/.S:S:1 etc. require further setting from ' iw=0' onwards above, 
'/.'/.as shown in the matrices for net .biasConnect , net . inputConnect 
'/.'/.and net.layerConnect 
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t 

'/.'/.Set training parameters 

'/.'/.trainFcn's available are: trainb, trainbfg, trainbr, 
'/.'/.traincgb, traincgf , traincgp, traingd, traingda, traingdm, 
'/.'/.traingdx, trainlm, trainoss, trainrp, trainscg 
'1.1 

'/.'/.Some of these have extra parameters 

'/.'/.See Matlab Neural Network Toolbox site 

'/,'/, (Chapter 5 in PDF manual) for details of each 

'/.'/. 

'/.'/.Single neurons use 'trainwb' , i.e. train by weights and bias 
'/,'/,' trainlm ' and 'trainrp' are good for galaxy classification 
'/.'/.(trainlm for small nets, trainrp for any size) 
t 

'/, net . trainFcn=trainrp 

net . trainParam . epochs=ep ; 
'/, net . trainParam. min_grad=; 
'/, net. trainParam. goal =gl; 



% 

'/.'/.Show values before training starts 
'/.'/.Zeros are shown if not set above 
y, 

iw 
lw 

bias 

bias2 

bias3 

lr 

ep 



% 

'/.'/.Training 

'/. 

t 

'/.'/.Set inputs p and targets t initially to zero to clear previous network run 
'/.'/.This is not done by the network initialisation function 

'/.'/.The run only uses the p's corresponding to the desired input galaxy parameters 

'/,'/,pl ... p8 correspond to each parameter from table 5.6 

'/.'/.See table 4.7 and below to see which parameter combinations are run 

t 

pl=0; '/,r50/r90 

p2=0; '/.P_Dev 

p3=0; '/.P_Exp 

p4=0; '/,u*-g* 

p5=0; '/.g*-r* 

p6=0; '/,r*-i* 

p7=0; '/,i*-z* 

p8=0; '/.r* 



t=0; 
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t 

'/."/.Show that each is zero in an array rather than on 9 separate lines 
% 

Inputs_zero= [pi p2 p3 p4 p5 p6 p7 p8] 
Target_zero=[t] 

t 

'/."/.Read the input data matrix params, from rows 1 to 454 

'/,'/, (Unneeded p values can be commented out to save processing time) 

'/."/.The for loop is ended further below 

'/."/.This is the point where the parameters could be normalised 
% 

for a=l:454; 

pl(a)=params(a,21) ; 

p2(a)=params(a,22) ; 

p3(a)=params(a,23) ; 
'/, p4(a)=params(a, 11) -params (a, 12) ; 
'/, p5(a)=params(a, 12) -params (a, 13) ; 
'/, p6(a)=params(a, 13) -params (a, 14) ; 
'/, p7(a)=params(a, 14) -params (a, 15) ; 
'/, p8(a)=params(a, 13) ; 

t(a)=params(a,20) ; 



t 

'/.'/.Set target outputs as the values read 
'/.'/. 

'/.'/.The value of P is not shown (hence the ;), because it is a LARGE matrix! 
% 

'/.'/.Type P=[pa; pb etc.]; to match read parameters 
'/.'/.e.g. P=[pl; p2; p3] ; for runl23 

P=[pl; p2; p3] ; 

t 

'/.'/.Train the network on rows st-fin 
'/.'/.using chosen training algorithm 
'/.'/.trainlm, trainrp, traingdm etc. 
'/.'/. 

'/.'/.If the end were placed after the [net.tr], this would 
'/.'/.give incremental as opposed to batch training 
% 

end 

[net , tr] =train (net , P , t ) ; 



% 

'/.'/.Use the trained network to classify all the galaxies in the desired data file 
'/.'/. 

'/.'/.params . dat = Shimasaku et al. 2001 

'/,°/,spectra2 . dat = Sloan Digital Sky Survey Early Data Release 
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'/,'/, 29,429 galaxies with spectra 

•/.'/. 

'/."/.Read relevant columns of loaded matrix rows st-fin and run net on each row, 

'/."/.Store results in z 

'/.'/. 

'/."/.err gives difference between trained network's type and true galaxy type 
'/."/.Doesn't matter if training set is used again in classification run 
t 

t 

'/."/.Trained network asks for the rows in the data which require classifying 
'/.'/. 

'/,°/,st = row at which to start 
'/.'/.fin = row at which to finish 

% 

st=input('Row number to start classifying at (1-454, 830 or 29433): '); 
fin=input('Row number to finish at: '); 

% 

'/.'/.Read the loaded data file. 

'/.'/.params . dat is already loaded so it doesn't 

'/.'/.need to be loaded again if being read here 

'/.'/. 

'/.'/.If a set is being classified which is smaller than a previous 

'/.'/.run's set, Matlab must be restarted to clear z(b), 

'/.'/.which will be the size of the previous run 

% 

for b=st:fin; 
'/.'/.To classify Shimasaku et al.: 

ql=params(b,21) ; 

q2=par ams (b , 22 ) ; 

q3=par ams (b , 23 ) ; 
'/. q4=params(b,ll)-params(b,12) ; 
'/. q5=params (b, 12) -params (b, 13) ; 
'/. q6=params (b, 13) -params (b, 14) ; 
'/. q7=params (b, 14) -params (b, 15) ; 
'/. q8=params (b , 13) ; 

'/.'/.To classify the SDSS Early Data Release 29,429 set: 

'/, load spectra2.dat 

'/. ql=spectra2(b,17)/spectra2(b,18) ; 
'/. q2=spectra2(b,20) ; 
'/. q3=spectra2(b,21) ; 
'/, q4=spectra2(b,7)-spectra2(b,8) ; 
'/, q5=spectra2(b,8)-spectra2(b,9) ; 
'/. q6=spectra2(b,9)-spectra2(b,10) ; 
'/. q7=spectra2(b,10)-spectra2(b,ll) ; 
'/. q8=spectra2(b,9) ; 

'/.'/.Type Q=[qa; qb etc.] to match read parameters 
'/.'/.e.g. Q=[ql; q2; q3] for runl23 

Q=[ql; q2; q3] ; 

z(b)=sim(net,Q) ; 
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err (b) =z (b) -params (b , 20) ; 
errsq(b)=err(b) "2; 

end 



% 

'/.'/.Output statistics from the network 

'/. 

y 

'/.'/.Weights and bias after training, used to classify the galaxies 
'/.'/.rmserr = RMS error between network type and true type 
'/.'/.(equals square root of Matlab's MSE output) 
'/.'/.stddev = Standard deviation of network types 

'/.'/.correlation = Spearman's rank correlation coefficient (-1 to 1) 
y 

iw 
lw 

bias 

bias2 

bias3 

rmserr=sqrt (mean(errsq) ) 

for c=st:fin; 
t(c)=params(c,20) ; 
end 

correlation=corrcoef (t ,z) 



% 

'/.'/.Plot results (requires 'xhost' [machine name] command 

'/.'/.on local machine if displaying window from a remote machine) 

'/. 

y, 

'/.'/.Form the 'y=x J line where network output equals the Shimasaku Type 
'/.'/. 

'/.7,x2 y2 etc. are for offset test - makes no difference to correlation 
y 

for x=l:7; 
'/. x2(x)=x-l; 
'/. y2(x)=x; 

x2(x)=x-l; 

y2(x)=x-l; 

end 



y 

'/.'/.Plot network output versus Shimasaku type 
'/.'/. 

'/.'/.'bins' gives the centres of the histogram bins, i.e. Shimasaku types 
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'/."/.The axes are automatically scaled 

y 

bins = [0 1 2 3 4 5 6]; 

plot(t,z, 'bo', x2,y2, 'g-') 

xlabeK ' Shimasaku Type') 
ylabeK 'Network Type') 

grid on 

legend( 'Results ' , 'Types equal') 

y 

'/.'/.Hist is a histogram of binned galaxy types (nearest integer) 
'/.'/.Can compare numbers of each galaxy type: histogram 
'/.'/.outputs the number of galaxies in each bin 
y 

'/, histogram=hist(z(st:fin) ,bins) 
'/, hist(z(st:f in) ,bins) 

'/, xlabeK 'Shimasaku Type') 
'/, ylabeK 'Number ' ) 
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Websites 

The URLs are active as of February 1, 2008. 
Autoclass 

http : / /ic-www . arc . nasa . gov/ic/pro j ect s/bayes-group/ autoclass 

Frei & Gunn Galaxy Catalogue 

http : //www . astro . princeton . edu/~f rei/catalog . htm 

Gene Smith's Astronomy Tutorial 

http : / / casswww . ucsd . edu/public/ tutorial/ Galaxies . html 

List of neural network programs available - 1 

ftp://ftp.sas.com/pub/neural/FAQ.html - see Parts 5 and 6. 

List of neural network programs available - 2 

http : //www . emsl . pnl . gov : 2080/pro j /neuron/neural/systems/ shareware . html 
LMorpho 

http : //www . public . asu . edu/~asusco/documents/lmorpho/INDEX . html 
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LMorpho Source Code 

http : //www . public . asu . edu/~asusco/document s/lmorpho/dist/index . html 

Los Alamos e-print Archive (Astro-ph), for preprint papers 

http : //xxx . soton .ac.uk 

Matlab Neural Network Toolbox documentation 

http : //www.mathworks . com/ access/helpdesk/help/toolbox/nnet/nnet . shtml 

NASA Astrophysics Data System (ADS), for published papers in journals 

http : //ukads . nottingham .ac.uk 

NASA Extragalactic Database Knowledgebase for Extragalactic Astronomy, 
for reviews and historical papers 

http : //nedwww . ipac . caltech . edu/level5 



